Methods to assess quality of microarrays

ABSTRACT

The present invention relates to methods and compositions for assessing the quality of microarrays. In particular, the invention relates to the use of quality control probes that are synthesized on the microarray monomer by monomer in a step-by-step synthesis. By assessing the degree of signal from the quality control probes and determining their deviation from expected signal intensities, the quality of microarray synthesis can be ascertained. The invention further relates to a method of detecting defects occurring during storage or processing of the microarray. The invention further relates to a method of using a computer to identify microarrays that have had a defect or defects during synthesis, storage, or processing.

This application claims priority to U.S. Provisional Application Ser.No. 60/392,629, filed Jun. 28, 2002, which is incorporated herein byreference in its entirety.

1. FIELD OF THE INVENTION

The present invention relates to methods ad compositions for assessingthe quality of microarray synthesis. The invention further relates to amethod of detecting defects occurring during storage or processing ofthe microarray. In particular, the invention relates to the use ofquality control probes that are synthesized on the microarray forassessing A microarray quality. The invention further relates to amethod of using a computer to m identify microarrays that have a defector defects, e.g., arising during synthesis, storage, or processing.

2. BACKGROUND OF THE INVENTION

DNA array technologies have made it possible, inter alia, to monitor theexpression levels of a large number of genetic transcripts at any onetime (see, e.g., Schena et al., 1995, Science 270:467-470; Lockhart etal., 1996, Nature BioTechnology 14:1675-1680; Blanchard et al., 1996,Nature BioTechnology 14:1649; Shoemaker et al., U.S. patent applicationSer. No. 09/724,538, filed on Nov. 28, 2000). DNA array technologieshave also found applications in gene discovery, e.g., in identificationof exon structures of genes (see, e.g., Shoemaker et al., U.S. patentapplication Ser. No. 09/724,538, filed on Nov. 28,2000; Meltzer, 2001,Curr. Opin. Genet. Dev. 11(3):258-63; Andrews et al., 2000, Genome Res.10(12):2030-43; Abdellatif, 2000, Circ. Res. 86(9):919-20; Lennon, 2000,Drug Discov. Today 5(2):59-66; Zweiger, 1999, Trends Biotechnol.17(11):429-36).

By simultaneously monitoring tens of thousands of genes, microarraytechnologies have allowed, inter alia, genome-wide analysis of mRNAexpression in a cell or a cell type or any biological sample. Aided bysophisticated data management and analysis methodologies, thetranscriptional state of a cell or cell type as well as changes of thetranscriptional state in response to external perturbations, includingbut not limited to drug perturbations, can be characterized on the mRNAlevel (see, e.g., U.S. Pat. No. 6,203,987; Stoughton et al.,International Publication No. WO 00/24936 (published May 4, 2000);Stoughton et al., International Publication No. WO 00/39336 (publishedJul. 6, 2000); Friend et al., International Publication No. WO 00/24936(published May 4, 2000)). Applications of such technologies include, forexample, identification of genes which are up regulated or downregulated in various physiological states, particularly diseased states.Additional exemplary uses for DNA arrays include the analyses of membersof signaling pathways, and the identification of targets for variousdrugs. See, e.g., Friend and Hartwell, International Publication No. WO98/38329 (published September 3, 1998); Friend and Stoughton,International Publication No. WO 99/59037 (published Nov. 18, 1999);U.S. Pat. Nos. 6,132,969; 5,965,352; 6,218,122.

A microarray is an array of positionally-addressable binding (e.g.,through hybridization) sites on a support. Each of such binding sitescomprises a plurality of biopolymer molecules of a probe bound to the apredetermined region on the support. Microarrays can be fabricated in anumber of ways, including immobilization of pre-synthesized probes onthe support or the in situ synthesis of probes on the support. Forexample, immobilization of pre-synthesized probes can be donerobotically as described in DeRisi et al. (1997, Science278(5338):680-6) or by inkjet. In situ synthesis can be accomplished bydifferent means, including using inkjet technology or by light-activatedsynthesis (Holmes et al., 1995, Biopolymers 37(3):199-21 1; Jacobs etal., 1994, Trends Biotechnol. 12(1):19-26; Fodor et al., 1991, Science251(4995):767-73). In either case of in situ synthesis, chemicalreactions take place on the support in which a monomer or monomers areadded to the biopolymer. As the biopolymer chain grows, however, thereis a chance that one or more of the synthesis cycles may fail (eitherfully or partially) thereby producing a probe that lacks one or more ofthe intended monomers. Synthesis efficiency depends on multiple factorsincluding reagent purity, reaction time, correct alignment of the inkjethead, etc. Defects in any of these processes can result inefficientaddition of a monomer or monomers to the growing biopolymer chain.

In addition, in the case of an inkjet-synthesized microarray, asynthesis defect may also occur when one of the nozzles of the inkjethead fails to deliver a reagent properly (e.g., if the nozzle becomestemporarily or permanently obstructed). A nozzle failure refers to anymalfunction of an individual ink jet nozzle. If a nozzle fails todeliver the desired solution required for biopolymer addition, it issometimes referred to as being “clogged” A nozzle failure can occur atany point during microarray synthesis. A failure at the beginning of thesynthesis may be due to insufficient priming of new reagents through thenozzles. A nozzle failure can also occur after the printing of a set ofmicroarrays has begun if, e.g., there are trapped air bubbles orparticulates. Nozzle failures can be detected and corrected before amicroarray is synthesized. Before the start of each synthesis batch andat the end of each synthesis batch every nozzle on the printhead can betested to make sure that it is properly functioning. This can be done byplacing a clean substrate on top of the head assembly before forcingeach nozzle to extrude a small amount of liquid. If all nozzles areworking properly, there will be a drop of liquid corresponding to eachnozzle. If, however, one or more nozzles is malfunctioning, the dropcorresponding to that nozzle position is missing. Because of the smallsize of the drops, a nozzle failure can be overlooked occasionally dueto human error, and an array will be synthesized that shows evidence ofa nozzle failure. Currently there exists a need for a more reliablemethod to determine if synthesis failures have occurred and, if so,where and when they happened during the course of microarray synthesis.Whereas it is possible to perform quality control on pre-synthesizedprobes by conventional DNA sequencing, by mass spectroscopy, or by othermeans, methods to assess the quality of probes synthesized in situ arelacking.

This application describes a method designed to assess the quality ofmicroarray synthesis. The herein disclosed invention describes methodsfor the design and production of quality control probes on themicroarray and methods for analysis of the information obtained frommicroarray processing that permit the determination of the overallquality of synthesis as well as the identity of the synthesis cycle mostlikely to have been defective. This invention also includes a databasethat contains information concerning the position and identity of thequality control probes on the microarray.

Citation or discussion of a reference herein shall not be construed asan admission that such is prior art to the present invention.

3. SUMMARY OF THE INVENTION

The present invention relates to methods and compositions to assess thequality of microarrays where the biopolymer probes are synthesized onthe array substrate monomer by monomer in a step-by-step synthesis. Inparticular, failures or inefficiencies in the deposition of individualsynthesis cycles of the microarray are detected through the inclusion ofquality control probes on the microarray. The quality control probes aresynthesized onto the microarray concurrently with the other biopolymerprobes and thus would also be subject to any synthesis failures orinefficiencies that may occur. By assessing the degree of signal fromthe quality control probes and determining their deviation from expectedsignal intensities, the quality of microarray synthesis can beascertained.

In one embodiment, each group of quality control probes comprises thesame redetermined binding sequence for which a binding partner exists inor is introduced into the sample to be contacted with the microarray foranalysis. The synthesis of the predetermined binding sequence in eachquality control probe is initiated during the step-by-step synthesis atsequential cycles of synthesis. By assessing the degree of binding of abiopolymer capable of binding to the predetermined binding sequence ofthe quality control probe, the quality of microarray synthesis can bedetermined. In another embodiment, the quality control probes do notcomprise a predetermined binding sequence. A detectable signal isgenerated by the quality control probe itself rather than a labeledbinding partner binding to the predetermined binding sequence. This canbe accomplished by, e.g., incorporation of one or more labeled monomersinto the quality control probe, staining of the quality control probewith a fluorescent dye, etc.

In a preferred embodiment, the invention relates to methods of detectingsynthesis failures on a oligonucleotide microarray. In a more preferredembodiment, the invention relates to methods of detecting synthesisdefects including nozzle failures during the synthesis of an ink jetoligonucleotide microarray. In addition to synthesis failures, otherdefects that affect microarray quality can also be detected, e.g., thosedue to degradation of probes during storage or processing of themicroarray.

The invention provides a positionally addressable array comprising asubstrate to which are attached a plurality of different biopolymerprobes, said different biopolymer probes in said plurality beingsituated at different positions on said surface and being the product ofa step-by-step synthesis of said biopolymer probes on said substrate,said plurality of different binding probes comprising a plurality ofquality control probes, the synthesis of said quality control probehaving been initiated during said step-by-step synthesis at sequentialcycles of synthesis. Each quality control probe in said pluralitycomprising a predetermined binding sequence preferably comprises thesame predetermined binding sequence or alternatively a differentpredetermined binding sequence but with the same binding specificity orsimilar binding characteristics (e.g., bind to their respective bindingpartner with similar intensities under the same binding conditions). Inone embodiment, predetermined binding sequences of different lengths canbe used (e.g., a 25mer and a 24mer).

In one specific embodiment of the array, the sequence of each saidquality control probe of said plurality consists of said predeterminedbinding sequence.

In another specific embodiment, the plurality of quality control probescomprise a second sequence consisting of a chemical structure contiguouswith said predetermined binding sequence, wherein at least some of thequality control probes differ from other of the quality control probesin length of said chemical structure. In a specific embodiment, thechemical structure is a sequence of number 0 to N monomers contiguouswith said predetermined binding sequence, and where N is a whole numberequal to or greater than 1. In a specific embodiment, the biopolymerprobes are oligonucleotides, said predetermined sequence consists of 25nucleotides, and said biopolymer probes that are not said qualitycontrol probes consist of 60 nucleotides. In a specific embodiment, N isnot greater than the number of monomers in said biopolymer probes on thearray that are not said quality control biopolymer probes minus thenumber of monomers in said predetermined binding sequence. In anotherspecific embodiment, the quality control probes comprise a greaternumber of monomers than biopolymer probes on the array that are not saidquality control biopolymer probes. In a further specific embodiment, anarray comprises 3, 10, 30, 60 or more of said quality control probesthat differ in N. A particular embodiment is wherein N is 0, 20, and 35,respectively, for different quality control probes.

In yet another specific embodiment, the plurality of quality controlprobes comprise

(i) quality control probes whose sequence consists of said predeterminedsequence; and

(ii) quality control probes that comprise a second sequence of number 0to N monomers contiguous with said predetermined binding sequence,wherein at least some of said quality control probes differ from otherof said quality control probes in the number of said monomers, and whereN is a whole number equal to or greater than 1.

In various specific embodiments, the biopolymer probes are nucleicacids, proteins, or antibodies. Preferably the predetermined bindingsequence is in the range of 10-40 nucleotides in length, and morepreferably, is 25 nucleotides in length. In a specific embodiment, thepredetermined binding sequence is SEQ ID NO:1 or SEQ ID NO:2 or acomplement thereof.

In one embodiment, the biopolymer probes consist of a sequence in therange of 20-100 nucleotides.

Preferably, the predetermined binding sequence of the quality controlbiopolymer probe is between 10-75% of the length of the length of thebiopolymer probes on the array that are not quality control probes.

In a specific embodiment, the predetermined binding sequence consists of25 monomers, and the biopolymer probes on the array that are not saidquality control probes consist of 60 monomers.

The invention also provides a method of determining if apositionally-addressable biopolymer array has a synthesis defectcomprising the following steps in the order stated:

a) contacting an array of the invention with a sample comprising abinding partner that binds said predetermined binding sequence;

b) detecting or measuring binding between two or more of said qualitycontrol probes and said binding partner in the sample; and

c) comparing binding of said two or more of said quality control probes,wherein if said binding is similar, the absence of a synthesis defectbetween said sequential cycles of synthesis of said array is indicated.

The invention further provides a method of determining if apositionally-addressable biopolymer array has a synthesis defectcomprising the following steps in the order stated:

a) contacting an array as described above containing the quality controlprobes comprising the 0 to N monomer contiguous sequence, with a samplecomprising a binding partner that binds said predetermined bindingsequence;

b) detecting or measuring binding between (i) two or more of saidquality control probes that differ in the number of said monomers; and(ii) said binding partner in the sample; and

c) comparing binding of said two or more of said quality control probes,wherein if said binding is similar, the absence of a synthesis defectbetween said sequential cycles of synthesis used to synthesize said twoor more quality probes is indicated.

The invention further provides a method of determining if apositionally-addressable biopolymer array has a synthesis defect causedby a nozzle failure comprising the following steps in the order stated:

a) contacting the array of the invention with a sample comprising abinding partner that binds said predetermined binding sequence, whereinat least a portion of said plurality of quality control probes isarranged in a periodicity of P and wherein said array is synthesized bystep-by-step synthesis using an inkjet printhead with P nozzles, whereinP is a whole number equal to or greater than 1;

b) detecting or measuring binding between two or more of said qualitycontrol probes and said binding partner in the sample; and

c) comparing binding of said two or more of said quality control probesin a periodicity of P, wherein if said binding is similar, the absenceof a nozzle defect is indicated.

In the foregoing methods, the comparing step can comprise determiningthe binding ratio of two of said two or more quality control probes,wherein said binding ratio is the amount of binding of a first of saidtwo quality control probes with said binding partner, divided by theamount of binding of a second of said two quality control probes withsaid binding partner, and wherein said binding ratio between 0.5 and 2.0indicates the absence of said synthesis defect.

In a specific embodiment, the foregoing methods further comprise beforestep (a) the step of synthesizing said array.

In a specific embodiment, the sample comprises (i) total cellular RNA ormRNA from one or more cells or a plurality of nucleic acids derivedtherefrom, and (ii) said binding partner, wherein said binding partneris not expressed by said cells.

The invention also provides a method of making apositionally-addressable array of a plurality of different biopolymerprobes comprising synthesizing said plurality of different biopolymerprobes on a substrate from monomers using a step-by-step synthesis suchthat each of said different biopolymer probes is attached to saidsubstrate at a different position on said substrate, wherein saidplurality of different biopolymer probes comprise a plurality of qualitycontrol probes, each quality control probe in said plurality comprisingthe same predetermined binding sequence, wherein the synthesis of saidpredetermined binding sequence in each of said quality control probes isinitiated during said step-by-step synthesis at sequential cycles ofsynthesis. The array thus made can have the characteristics describedabove.

The invention further provides an oligonucleotide comprising anucleotide sequence of SEQ ID NO:1 or SEQ ID NO:2 or the complementthereof.

4. DESCRIPTION OF THE FIGURES

FIG. 1 illustrates an ink jet oligonucleotide microarray that wassynthesized with three malfunctioning nozzles. Entire rows correspondingto nozzles 4, 15, and 20 were not synthesized due to nozzle malfunction.

FIGS. 2A-2B schematically illustrate the use of quality control probeswith spacers to determine the synthesis quality of an oligonucleotidemicroarray. (A) The 25 nucleotide long probe was either synthesizeddirectly onto the microarray or was attached to a spacer of varyinglengths (i.e., 20 nucleotides or 35 nucleotides). (B) A synthesis errorin synthesis cycle 24 is depicted and thus affects the sequence ofmonomers in the predetermined binding sequence in only the first twoquality control probes shown. The solid line depicts the quality controlprobe and the dashed line depicts the spacer.

FIGS. 3A-3B schematically illustrate the use of staggered start qualitycontrol probes to determine the synthesis quality of an oligonucleotidemicroarray. (A) A series of 25 nucleotide quality control probes aresynthesized directly on the microarray staring at synthesis cycle 1through synthesis cycle 36. The only difference between the qualitycontrol probes is the synthesis cycle at which synthesis begins. (B) Asynthesis error in synthesis cycle 29 is depicted and thus only affectsthe quality control probes in which synthesis cycle 29 was actually usedto add a monomer to the sequence of the quality control probe (i.e.,those quality control probes that begin synthesis at synthesis cycles5-29). The bold line depicts the quality control probe and the thin linedepicts synthesis cycles that had no monomer deposited.

FIGS. 4A-4B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when there were no known or detectable synthesis defectsduring oligonucleotide microarray synthesis. (A) Microarray image afterhybridization to a fluorescently labeled oligonucleotide that hybridizedto the quality control probes. (B) Higher magnification of themicroarray in (A) that depicts the positions of the 25mer, 40mer, and60mer.

FIGS. 5A-5B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when the first synthesis cycle was intentionally skippedduring oligonucleotide microarray synthesis. (A) Microarray image afterhybridization to a fluorescently labeled oligonucleotide that hybridizedto the quality control probe. (B) Higher magnification of the microarrayin (A) that depicts the positions of the 25mer, 40mer, and 60mer.

FIGS. 6A-6B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when the first and second synthesis cycle were intentionallyskipped during oligonucleotide microarray synthesis. (A) Microarrayimage after hybridization to a fluorescently labeled oligonucleotidethat hybridized to the quality control probe. (B) Higher magnificationof the microarray in (A) that depicts the positions of the 25mer, 40mer,and 60mer.

FIGS. 7A-7B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when the thirty sixth synthesis cycle was intentionallyskipped during oligonucleotide microarray synthesis. (A) Microarrayimage after hybridization to a fluorescently labeled oligonucleotidethat hybridized to the quality control probe. (B) Higher magnificationof the microarray in (A) that depicts the positions of the 25mer, 40mer,and 60mer.

FIGS. 8A-8B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when the thirty fourth and thirty fifth synthesis cycles wereintentionally skipped during oligonucleotide microarray synthesis. (A)Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Highermagnification of the microarray in (A) that depicts the positions of the25mer, 40mer, and 60mer.

FIGS. 9A-9B illustrate the use of quality control probes comprising aspacer to determine the synthesis quality of an oligonucleotidemicroarray when there was inefficient synthesis in the first twenty twosynthesis cycles during oligonucleotide microarray synthesis. (A)Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Highermagnification of the microarray in (A) that depicts the positions of the25mer, 40mer, and 60mer.

FIGS. 10A-10B illustrate the use of staggered start quality controlprobes to determine the synthesis quality of an oligonucleotidemicroarray when there was inefficient synthesis in the first and secondsynthesis cycles during oligonucleotide microarray synthesis. (A)Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Themean fluorescence intensity plot of the quality control probes at eachsynthesis cycle.

FIGS. 11A-11B illustrate the use of staggered start quality controlprobes to determine the synthesis quality of an oligonucleotidemicroarray when there was inefficient synthesis in the first fivesynthesis cycles during oligonucleotide microarray synthesis. (A)Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Themean fluorescence intensity plot of the quality control probes at eachsynthesis cycle.

FIGS. 12A-12B illustrate the use of staggered start quality controlprobes to determine the synthesis quality of an oligonucleotidemicroarray when there was inefficient synthesis in the first eightsynthesis cycles during oligonucleotide microarray synthesis. (A)Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Themean fluorescence intensity plot of the quality control probes at eachsynthesis cycle.

FIGS. 13A-13B illustrate the use of staggered start quality controlprobes to determine the synthesis quality of an oligonucleotidemicroarray when there was inefficient synthesis in the forty fifth tosixtieth synthesis cycles during oligonucleotide microarray synthesis.(A) Microarray image after hybridization to a fluorescently labeledoligonucleotide that hybridized to the quality control probe. (B) Themean fluorescence intensity plot of the quality control probes at eachsynthesis cycle.

FIGS. 14A-14B illustrate the increased sensitivity of a single-deletionquality control probe. Microarray with synthesis defects in the thirtyfourth and thirty fifth synthesis cycles were synthesized with qualitycontrol probes either (A) without or (B) with an intentional singledeletion in the predetermined binding sequence. The labeled reversecomplement of the full-length 25 nucleotide predetermined bindingsequence was used to hybridize with each microarray. The meanfluorescence intensity plot of the quality control probes at eachsynthesis cycle was determined for each microarray.

FIGS. 15A-15C illustrate correlations between fluor reversed pairs for amicroarray that had skipped the first twenty two synthesis cycles duringsynthesis (A); a microarray that had no synthesis defect (B); and amicroarray that had skipped the first twenty two synthesis cycles duringsynthesis with a microarray that had no synthesis defect (C).

FIGS. 16A-16D illustrate correlations between oligonucleotidemicroarrays that had no synthesis defects with oligonucleotidemicroarrays that had the first (A), fist and second (B), thirty sixth(C), or thirty fourth and thirty fifth (D) synthesis cycles skippedduring synthesis.

FIGS. 17A-17D schematically illustrate a microarray with quality controlprobes attached to the substrate. (A) outer gridline, (B) diagonalgridline, (C) internal cluster, (D) corner cluster.

5. DETAILED DESCRIPTION OF THE INVENTION

The object of the present invention is to assess the quality ofmicroarray synthesis for arrays where the biopolymer probes aresynthesized on the array substrate monomer by monomer in a step-by-stepsynthesis. This object is fulfilled by the synthesis of quality controlprobes on the microarray to be assessed. The quality control probes aresynthesized in the same manner as, and in conjunction with, the otherbiopolymer probes on the microarray.

The quality control probes may comprise a predetermined bindingsequence. This predetermined binding sequence has a binding partner thatcan be used to detect the presence of the predetermined binding sequenceduring microarray processing. In some embodiments, the quality controlprobe also comprises a chemical structure contiguous with thepredetermined binding sequence (such chemical structure referred toherein as a “spacer”). The spacer is preferably a polymer (e.g., asequence) of additional monomers attached to (contiguous with) thepredetermined binding sequence. Upon completion of microarray synthesis,the quality control probes are detected by binding to a labeled bindingpartner. The degree of binding is quantified for each quality controlprobe and compared to the binding intensities of other quality controlprobes. Similar binding intensities indicate synthesis was equallyefficient throughout the synthesis.

In another specific embodiment, the quality control probes do notcomprise a predetermined binding sequence. In such an embodiment, thesignal observed with this type of quality control probe is emittedeither by 1) the monomers that make up the quality control probedirectly or 2) a label (e.g., a dye) that interacts with or is attachedto the monomers that make up the quality control probe. Deviation fromthe expected binding intensities indicate a defect in the array, e.g.,due to a synthesis defect, or degradation during storage or processing.

Although the invention is generally described in terms of the use of onegroup of quality control probes, it will be understood that differentgroups of quality control probes can also be used on a singlemicroarray. The different groups of quality control probes may havedifferent predetermined binding sequences or may be a mixture of qualitycontrol probes with and without predetermined binding sequences. Thequality control probes may also be a mixture of different lengths (e.g.,a mixture of quality control probes comprising predetermined bindingsequences of 25mers or 24mers).

5.1 Quality Control Probes with Predetermined Binding Sequences

5.1.1 Predetermined Binding Sequence

Quality control probes with predetermined binding sequences arebiopolymers that comprise a predetermined binding sequence and do notinterfere with the results of the intended microarray processing. So asto avoid cross-reactivity in binding, biopolymers of the sample to beassayed should not bind to the quality control probes on the microarray.Also, the reverse complement of the predetermined binding sequence usedto bind to and detect the quality control probes should not bind to thetest probes (i.e., probes on the microarray designed to bind biopolymersof the sample) on the microarray. In the method of the presentinvention, the quality control probe is made according to the particularrequirements of the combination of origin, preparation, and processingof the sample to be analyzed on the microarray to be synthesized.Preferably, wherein the sample to be analyzed on the microarraycomprises naturally occurring nucleic acids or proteins, thepredetermined binding sequence of the quality control probes is notpresent or is not known to be present in any naturally occurring nucleicacid or is not known to encode any naturally occurring protein,respectively. In another embodiment, the predetermined binding sequenceof the quality control probes is not present or is not known to bepresent in the sample. This is done to reduce the likelihood that thepredetermined binding sequence will be cross-reactive. Cross-reactivityindicates that a biopolymer has the ability to interact (e.g., hybridizeor bind) with more than one other biopolymer present during microarrayprocessing. For example, during processing of an oligonucleotidemicroarray, if the predetermined binding sequence hybridizes with itscomplementary nucleic acid as well as with a different sequence in thebiological sample then the probe is said to be cross-reactive.Cross-reactivity in a probe is undesirable, since it could alter thesignal intensities observed from sample processing and affect theassessment of microarray synthesis quality.

In one embodiment, the potential sequence of monomers that make up thepredetermined binding sequence can be identified from a pool of randomlysynthesized sequences. These potential predetermined binding sequencescan then be assayed for their cross-reactivity with the biologicalsample to be processed or probes designed to detect naturally occurringsequences in the biological sample during processing. Preferably,predetermined binding sequences that are not substantiallycross-reactive with biopolymers being assayed in the sample are used inquality control probes. Thus, at the time of microarray synthesis, thesequence of the quality control probes is k]own although the sequence israndom in that it had initially been the product of a random synthesis.The random sequences are biopolymer residues (e.g., nucleotide or aminoacid residues) that are generated without a preplanned specific designas to the actual resulting sequence, i.e., when a monomer (e.g.,nucleotide, amino acid) is said to be random it is unpredictable whatmonomer will occur at that residue. The random sequences can besynthesized by an unbiased synthesis scheme wherein each possibleresidue has an equal chance of being incorporated into the biopolymer ateach position. Alternatively, the random sequences can be synthesized bya biased synthesis scheme wherein certain positions in the biopolymerhave an increased chance of having one residue over another.Additionally, a combination of unbiased and biased synthesis methods canbe used to synthesize any one biopolymer. In one embodiment, sequenceson either end or at internal positions may be added to the predeterminedbinding sequence for the purposes of facilitating standard molecularbiological manipulations. Once generated, the sequence of thepredetermined binding sequence if generated randomly is determined.Preferably, the sequence is then tested for cross-reactivity, andrecorded for future use. For each microarray one or more of thepredetermined binding sequences that have been empirically determined tobe noncross-reactive are then synthesized on the microarray to allow forfuture assessment of synthesis quality or other non-synthesis defects inthe array.

In another embodiment, the predetermined binding sequence can be anaturally occurring sequence that is not endogenous to the sample thatis to be processed on the microarray. For example, if the sample is froma eukaryotic source, then a bacterial sequence (or fragment thereof) canbe used as the predetermined binding sequence. Cross-reactivity could beassessed as a precautionary measure.

Accordingly, where the binding partner to the predetermined sequence ofthe quality control probes is not endogenously present in the sample tobe assayed for binding to the microarray, the binding partner to thepredetermined binding sequence in the quality control probe isintroduced into the sample at any time prior to or during contacting ofthe sample with the microarray. In one embodiment, the binding partneris added to the sample during sample processing. In a more preferredembodiment, the binding partner is added to the sample immediately priorto contact of the sample with the microarray.

The predetermined binding sequence can be made of any type of biologicalmacromolecule; preferably the molecular nature of the quality controlprobe is consistent with that of the other biopolymer probes on themicroarray. For example, the predetermined binding sequence can becomposed of nucleotides (i.e., DNA or RNA), amino acids, glycans,saccharides, or small organic molecules.

In one embodiment, the predetermined binding sequence is a nucleic acid,preferably an oligonucleotide, and a nucleic acid microarray iscontacted with a sample comprising a nucleic acid comprising a sequencecomplementary to the predetermined binding sequence under conditionsconducive to hybridization, and the amount of hybridization to qualitycontrol probes is compared.

In another embodiment, the predetermined binding sequence is a protein(polypeptide or peptide), and a protein microarray is contacted with asample comprising a binding partner to said protein under conditionsconducive to binding, and the amount of binding to quality controlprobes is compared. In one embodiment, the binding moiety is an epitoperecognized by an antibody, preferably a monoclonal antibody. Preferably,epitopes re unique (i.e., not endogenously expressed in cells or tissuesthat provide protein material for the samples) to minimizecross-reactivity of the antibodies directed to predetermined bindingsequence epitopes with sample epitopes during detection.

The length of the predetermined binding sequence can vary depending uponthe length of the other biopolymer probes on the microarray used todetect binding partners in the sample to be assessed. Typically, thepredetermined binding sequence is composed of a smaller number ofmonomers than the other biopolymer probes on the microarray. This allowsthe predetermined binding sequence to represent only a subset of thetotal monomers that make up the other biopolymer probes on themicroarray. As such, multiple predetermined binding sequences are neededto represent each full length biopolymer probe. This allows fordifferent cycles of synthesis to be targeted for inspection by differentquality control probes depending upon which cycles of synthesis wereused to synthesize the predetermined binding sequence. Bindingintensities can be compared between different predetermined bindingsequences to ascertain information regarding the different portions ofthe full length biopolymer probes. The predetermined binding sequence ispreferably between 5-95%, 10-75%, 25-65%, 35-60%, 40-55%, or 41-48% ofthe lenght of the other biopolymer probes on the microarray. In anotheremboddnent, the predetermined binding sequence is 15 biopolymer residueswhen the other probes on the microarray are 60 biopolymer residues inlength. In another embodiment, the predetermined binding sequence is 25biopolymer residues when the other probes on the microarray are 60biopolymer residues in length. The length of the biopolymer probes onthe microarray that are not quality control probes, when nucleic acids,is preferably in the range of 10-500 nucleotides, more preferably10-250, 20-100, 40-80, 50-70 or 60 nucleotides.

5.1.1.1 Predetermined Binding Sequences with Intentional Deletions

In some embodiments, the predetermined binding sequence has anintentional deletion of one or more monomers relative to a sequence thatbinds a binding partner used to detect the quality control probe duringmicroarray processing. Thus, in a specific embodiment, the predeterminedbinding sequence has an internal deletion of a nucleotide relative to asequence perfectly complementary to the nucleic acid used to detect thequality control probe by hybridization. Although this does decrease thesignal intensity due to an imperfect binding pair, signal can still beobserved. Any additional deletions due to a failure during microarraysynthesis would exacerbate the difference between predetermined bindingsequence and binding partner and thus serve to drastically reduce thesignal observed during microarray processing. In one embodiment, on anoligo microarray, the predetermined binding sequence is a 24mer (i.e.,has one monomer intentionally deleted) and the binding partner is a25mer.

In one embodiment, each quality control probe on a microarray comprisesa predetermined binding sequence comprising one or more such intentionaldeletions. In another embodiment, the quality control probes on amicroarray arc a mixture of those comprising predetermined bindingsequences comprising one or more intentional deletions and thosecomprising predetermined binding sequences with no intentionaldeletions.

5.1.2 Spacers

In some embodiments, the quality control probes comprise a chemicalstructure contiguous with the predetermined binding sequence. Thischemical structure is referred to herein as a spacer. The spacer ispreferably made up of 0 to N monomers (e.g., nucleotides, amino acidresidues), where N is a whole number integer equal to or greater than 1.Preferably, the spacers added are less than 75%, less than 50%, lessthan 25%, less than 20%, less than 15%, less than 10%, less than 5%, orless than 1% of the total sequence of the quality control probe. Spacerscan be on one side of the predetermined binding sequence or on bothsides. For nucleic acid probes, the spacers can be either 5′ or 3′ orboth 5′ and 3′ to the predetermined binding sequence. In one embodiment,the spacers are exclusively 3′ to the predetermined binding sequence.For protein probes, the spacers can be either amino- or carboxy-terminalor both amino- and carboxy-terminal to the predetermined bindingsequence. In a specific embodiment, the spacer is a nucleotide orprotein sequence.

In one embodiment, the value of the upper limit of N is determined bythe length of the biopolymer probes synthesized on the microarray thatare not quality control probes (i.e., those not containing thepredetermined binding sequence). The total length of the quality controlprobe is preferably not greater than the total length of the otherbiopolymer probes on the microarray. Therefore, in a specificembodiment, N plus the number of monomers in the predetermined bindingsequence should equal the total number of monomers in the biopolymerprobes on the array that are not quality control probes. In anotherembodiment, the value of N is not constrained by the length of thebiopolymer probes synthesized on the microarray that are not qualitycontrol probes (i.e., those not containing the predetermined bindingsequence). In this embodiment, quality control probes can be shorter orlonger than the other biopolymer probes on the microarray.

Spacers are preferably not cross-reactive with the biopolymer beingassayed in the sample. During microarray processing, preferably nosignal is detected from the spacer. Additionally, the spacer should notinterfere with the signal generated from the predetermined bindingsequence binding to its binding partner during microarray processing. Inone embodiment, interference with such signal is prevented because thechemical structure that makes up the spacer is modified such that thespacer is not able to bind a binding partner. For example, modifiednucleic acids that are not competent to hybridize can be used in spacersand will be non-cross-reactive, e.g., abasic nucleotides (i.e., moietieslacking a nucleotide base, but having the sugar and phosphate portions)(see generally U.S. Pat. No. 6,248,878; Takeshita et al, 1987, J. Biol.Chem. 262:10171; abasic nucleotides are commercially available from GlenResearch in Sterling, Va.). In another embodiment, spacers can be madeof a chemical moiety that is different from the monomers present in theother biopolymer probes on the microarray not dedicated to qualitycontrol and/or the monomers that make up the predetermined bindingsequence. For example, on a nucleotide microarray, spacers can be madefrom non-nucleotide moieties such as polyether, polyamine, polyamide, orpolyhydrocarbon compounds. Specific examples include those described bySeela and Kaiser, 1990, Nucleic Acids Res. 18:6353; Seela and Kaiser,1987, Nucleic Acids Res. 1987, 15:3113; Cload and Schepartz, 1991, J.Am. Chem. Soc. 113:6324; Richardson and Schepartz, 1991, J. Am. Chem.Soc. 113:5109; Ma et al., 1993, Nucleic Acids Res. 21:2585; Ma et al.,1993, Biochemistry 32:1751; Durand et al., 1990, Nucleic Acids Res.18:6353; McCurdy et al., 1991, Nucleosides & Niicleotides 10:287;Jaschke et al., 1993, Tetrahedron Lett. 34:301; Ono et al., 1991,Biochemistry 30:9914; Ferentz and Verdine, 1991, J. Am. Chem. Soc.113:4000; U.S. Pat. No. 6,362,323; International Publication No. WO89/02439.

Preferably, once generated, the entire quality control probe sequence isdetermined, tested for cross-reactivity, and recorded for future use.

5.2 Quality Control Probes without Predetermined Binding Sequences

In some embodiments quality control probes do not have predeterminedbinding sequences but are made exclusively of a spacer. Signals observedwith this type of quality control probes are emitted either 1) directlyfrom the chemical structure (e.g., the monomers) that make up thequality control probe or 2) indirectly through the use of a labeled dyewhich interacts with the chemical structure (e.g., the monomers) thatmake up the quality control probe. These types of quality control probescan give off a signal without the use of a labeled binding partner.

Thus, in one embodiment, quality control probes are synthesized withlabeled monomers. The labeled monomers can be, for example,fluorescently labeled (e.g., Cy3, Cy5) nucleotides or fluorescentlylabeled amino acids. Other labels that can be used include, but are notlimited to, electron rich molecules and radioactive isotopes. Eachquality control probe incorporates one or more labeled monomers duringsynthesis.

In a specific embodiment, the synthesis cycle in which the labeledmonomer is incorporated into the quality control probe is varied witheach quality control probe. Each cycle of synthesis is represented by atleast one, but preferably more than one, quality control probe having alabel in the monomer deposited in that synthesis cycle. Should asynthesis defect occur, no labeled monomer is incorporated and thedeficiency can be detected. In a preferred aspect, each quality controlprobe is the same length.

In another specific embodiment, the quality control probe is made of thesame number of monomers that make up the test probes on the microarray(i.e., those probes on the microarray that are not quality controlprobes) with one of the monomers being labeled.

In another specific embodiment, the quality control probes are varyinglengths such that there is at least one, but preferably more than one,quality control probe that terminates at each cycle of synthesis. Insuch quality control probes, the last monomer of each of the qualitycontrol probes is a labeled monomer.

In another embodiment, quality control probes are synthesized with nopredetermined binding sequence using unlabeled monomers. The signalgenerated relies on the monomers' intrinsic ability to generate asignal, e.g., to fluoresce. Nucleic acid quality control probes ofvarying lengths can be synthesized and the microarray can be scannedprior to processing by hybridization to labeled probes. The degree offluorescence observed should correlate with the length of the qualitycontrol probes due to an increased number of monomers (nucleotides) inlonger probes.

In another embodiment, a labeled dye that directly binds to the monomersof the quality control probes can be used to generate a detectablesignal. For example, for a nucleic acid microarray, various fluorescentnucleic acid stains can be used such as POPO, SYBR Green I, SYBR GreenII, SYTO 59, and SYTO 61 (available from Molecular Probes, Inc. inBugene, Oreg.). After assessing the microarray synthesis efficiency, thedyes can be removed prior to incubation of the microarray with testsamples.

5.3 Quality Control Probe Synthesis on Microarrays

During a step-by-step biopolymer probe synthesis onto the microarraysubstrate, there can be faulty monomer addition at one or more synthesiscycles of synthesis at one or more areas on the microarray. To discernif such a synthesis error occurred, quality control probes of theinvention are synthesized at different places on the microarray and thesignals of the different quality control probes are compared.Significant signal deviation tom what is expected indicates a synthesisdefect (see Section 5.4).

5.3.1 Vertical Placement

In one embodiment, quality control probes that generate a detectablesignal either by binding to a predetermined binding sequence or byincorporation of labeled monomers can be displaced from each othervertically to assess the efficiency of all cycles of synthesis. In oneembodiment, synthesis of the predetermined binding sequences isinitiated during the step-by-step monomer addition at different cyclesof synthesis. Therefore, although each predetermined binding sequencefor a group of quality control probes is identical, the cycles ofsynthesis creating the predetermined binding sequence on the microarrayare displaced from each other in a vertical fashion. In anotherembodiment, the cycle of synthesis in which the labeled monomer isincorporated into the quality control probe is varied such that eachcycle of synthesis should have incorporated a labeled monomer in atleast one quality control probe. These methods can be used to pinpointthe cycle of synthesis that was affected by faulty monomer addition.

In one embodiment, this vertical displacement is accomplished throughthe use of spacers. For example, by varying the number of monomers in aspacer, the synthesis cycle of the microarray at which synthesis beginsof the predetermined binding sequence will also vary. Consequently, thismakes each predetermined binding sequence vulnerable to defects inmonomer addition occurring at different cycles in the synthesis. Shouldthere be no synthesis defects during microarray synthesis, then thebinding partner of the predetermined binding sequence should bindequally well (i.e., similarly) to the predetermined binding sequence onall of the quality control probes. In determining if the binding partnerof the predetermined binding sequence on the different quality controlprobes are binding similarly, it must be appreciated that, when thequality control probes comprise spacers, differences in binding may bedue in part to the distance the predetermined binding sequence is fromthe microarray (see Section 5.4). The binding differences thus expecteddue to the different spacer lengths are thus preferably ignored whendetermining whether the different quality control probes are binding“similarly”.

In a specific embodiment, the quality control probes of a group allcomprise identical predetermined binding sequences but differ in theoverall number of monomers in the quality control probe due to a varyingnumber of monomers comprising the spacers.

In another embodiment, this vertical displacement AD accomplished byvarying the synthesis cycle of the microarray at which the labeledmonomer is incorporated into the quality control probe. Consequently,this makes each labeled monomer addition vulnerable to defects inmonomer addition occurring at different cycles in the synthesis. Shouldthere be no synthesis defects during microarray synthesis, then eachquality control probe should have incorporated and equal number oflabeled monomers and thus will give comparable signals.

In another embodiment, this vertical displacement is accomplished with astaggered start synthesis. As above, each predetermined binding sequenceis displaced in its start of synthesis with respect to each other by oneor more sequential cycles of monomer addition. In one embodiment,spacers are used to accomplish this displacement. In a more preferredembodiment, spacers are not used to accomplish this displacement.Rather, monomer addition is delayed at the position on the microarray tobe occupied by the predetermined binding sequence until microarraysynthesis has reached the cycle at which synthesis of the predeterminedbinding sequence is to be initiated. In this embodiment, all qualitycontrol probes comprise the same number of monomers but the synthesisusing these monomers at different positions on the microarray(corresponding to the quality control probes) is separated temporally.

5.3.2 Horizontal Placement

The quality control probes of the invention can be synthesized on themicroarray substrate in a number of different locations to make up anumber of different patterns. These patterns can be used to identifyareas of microarray synthesis defects as well to impart positionalinformation to the microarray during processing. The number of qualitycontrol probes on a microarray should be sufficient to adequatelyrepresent the synthesis across the entire microarray. For example, thenumber of probes on the microarray that are quality control probesshould be about 0.5% or more, 1% or more, 2% or more, 3% or ore, 5% ormore, 10% or more, 20% or more, of the total probes on the microarray.

In one embodiment, one or more rows of quality control probes (calledgrdlines) can be synthesized at different positions throughout themicroarray. Each section of the microarray can contain a gridline toensure that all sections have been assessed for competent synthesis. Inone embodiment, the integrity of biopolymer probe synthesis at the edgeof the microarray can be monitored through the use of an outer (orperimeter) gridline, e.g., of 1-5 adjacent borders of quality controlprobes (FIG. 17A). Sections of the microarray near or at the edge can bededicated to quality control probes such that any defect can be detectedshould it be present In another embodiment, the integrity of biopolymerprobe synthesis in the center of the microarray can be monitored throughthe use of a diagonal gridline (FIG. 17B). Quality control probes can besynthesized in positions that traverse the array diagonally thustraversing representative sections of the microarray. In a preferredembodiment, gridline patterns are made up of quality control probescontaining spacers.

In another embodiment clusters of quality control probes can besynthesized in sections of the microarray to assess synthesis quality.In one embodiment, quality control probes are synthesized in randomizedpositions throughout the middle of the array (FIG. 17C). In anotherembodiment, quality control probes can be synthesized at the corners ofthe microarray (FIG. 17D).

In another embodiment, when the microarrays are synthesized by ink jettechnology, the quality control probes can be arranged on the microarraysuch that failures of particular nozzle(s) can be detected. A reductionin signal intensity in quality control probes that have a periodicityconsistent with being printed by a particular nozzle can signify thatthat nozzle has been repeatedly defective. When there are N nozzles inthe ink jet head, a reduction in quality control probe intensity with aperiodicity of N signifies a clogged or defective nozzle (wherein N is awhole number of 1 or greater). In one embodiment, N is 20. In a furtherembodiment, the diagonal gridline (FIG. 17B) is used to assess nozzleclogs or defects.

In another embodiment, quality control probe patterns can be used toimpart positional information about the microarray. Because the sites atwhich the quality control probes are synthesized during microarraysynthesis are known, probes can be used to align the microarray duringprocessing.

5.4 Detection of Defects on a Microarray

All of the quality control probes that comprise the same predeterminedbinding sequence should bind to the binding partner similarly. However,in many instances, the inventors have found that spacers that increasethe distance of the predetermined binding S sequence from the microarrayactually increase the signal intensity upon binding of a givenpredetermined binding sequence when compared to the signal observed froman identical predetermined binding sequence attached directly to themicroarray. Without being bound by a particular mechanism, the increasedsignal intensity may result from the predetermined binding sequencebeing more accessible to its binding partner by virtue of its beingfurther away from the microarray (e.g., by having spacers directlyattached to the microarray comprising an increasing number of monomerscontiguous with the predetermined binding sequence). A deviation in theamount of binding between different quality control probes and thebinding partner that is greater than that expected due to differingdistance of the predetermined binding sequence from the microarray mayindicate a problem in microarray quality. Defects in microarray qualitymay be global (i.e., the defect affects the entire microarray) orlocalized (i.e., the defect affects one or more areas of the microarrayand does not affect other areas).

In a specific embodiment, use of the quality control probe of theinvention allows detection of microarray synthesis defects (e.g., a flowcell gradient where bubbles or other problems in the flow cell lead tonon-uniform reagent coverage of the microarray during some of thesynthesis cycles). However, other types of defects affecting microarrayquality can also be detected by use of the quality control probes of theinvention. Defects in the microarray can be due to occurrences otherthan synthesis defects. Quality control probes can be used to detectthese types of defects as well. In one embodiment, microarray defectsdetectable by the methods of the invention occur during storage of themicroarray. Suboptimal conditions (e.g., improper temperature ormoisture level) can cause microarray quality to deteriorate. Otherdefects that are detectable by the methods of the invention include butare not limited to an abrasion that causes a localized defect on amicroarray. Such an abrasion can occur during storage or processing ofthe microarray. A defect can occur during processing of the microarray.Such a defect can cause a non-uniformity of signal that can be detectedby comparing signal intensities across the microarray. Comparison ofbinding intensities can be accomplished in a number of ways.

In one embodiment, a binding ratio for each set of quality controlprobes can be calculated. For quality control probes comprisingpredetermined binding sequences and not comprising spacers, signalsgenerated during microarray processing for a particular quality controlprobe should equal signals generated for another different qualitycontrol probe. A ratio of the two signals should approach 1. Deviationfrom 1 indicates that one of the two quality control probes used in thecalculation had decreased binding to its binding partner. Such would bethe case if a synthesis defect caused the predetermined binding sequencein the quality control probe to be defective and thus unable to bind itsbinding partner at normal levels. For quality control probes comprisingboth predetermined binding sequence and spacers, signals generatedduring microarray processing for a particular quality control probe mayor may not equal signals generated for another different quality controlprobe due to the differences in distance form the microarray. A ratio ofthe two signals from predetermined binding sequences that are a similardistance from the microarray (e.g., the synthesis of each predeterminedbinding sequence was initiated within 3 cycles of synthesis from eachother) should approach 1. However, a ratio of the two signals frompredetermined binding sequences that are different distances from themicroarray (e.g. the synthesis of each predetermined binding sequencewas initiated greater than 3 cycles of synthesis from each other) coulddeviate from 1. In this instance, the ratio expected can be determinedusing data from microarrays known to have no defects. Such microarrayscan be identified, e.g., by making a plurality of arrays (preferably atleast 5) and comparing the results to identity ones with no defects.Deviation from this determined expected ratio can then be used to detectdefects in microarrays.

For each type of microarray (e.g., oligonucleotide, protein, etc.), therange of binding ratio values that indicates that there is no defect canbe determined empirically. For example, various predetermined bindingsequences known to be without defect can be bound to their bindingpartner and signals recorded. This can serve as the baseline values usedto determine the expected binding ratios. By varying the horizontal andvertical placement of the quality control probes on the microarray, arange of acceptable ratios can be determined. Deviation from theseempirically determined ratios indicates a defective microarray. In aspecific embodiment, when the microarray is an oligonucleotidemicroarray, a binding ratio of between 0.25 and 2.25, 0.5 and 2.0,or0.75 and 1.25 indicates no synthesis defect.

In a specific embodiment, for microarrays using quality control probesthat are a mixture of those comprising predetermined binding sequencescomprising one or more intentional deletions relative to a sequence thatbinds a binding partner used to detect the quality control probe duringmicroarray processing and those comprising predetermined bindingsequences with no intentional deletion, binding ratios can be calculatedand used to assess microarray quality. Signals generated by binding of alabeled binding partner to each type of predetermined binding sequence(i.e., either with or without intentional deletions) will necessarily bedifferent. Binding intensities determined from microarrays known to haveno defects can be used to calculate expected binding ratios. Suchmicroarrays can be identified, e.g., by making a plurality of arrays(preferably at least 5) and comparing the results to identify ones withno defects. Deviation from the expected ratio indicates a defect.

In another embodiment, comparison of binding intensities can beaccomplished through a statistical analysis. The mean binding intensityfor a group of quality control probes can be calculated by averaging thevalue of the signal (e.g., fluorescence) observed for each. The amountof signal observed for each individual quality control probe can then becompared to the mean of the group. In one embodiment, those qualitycontrol probes that are within two standard deviations from the meanindicate that there is no quality defect in the microarray, e.g., thatthere was no defect during their synthesis, or incurred duringprocessing, storage, or otherwise. In a more preferred embodiment, thosequality control probes that are within one standard deviation from themean indicate that there is no defect.

In another embodiment, more than one fluorescent dye can be used tolabel the binding partner which binds to the predetermined bindingsequence. For example, a subset of the binding partners can be labeledwith Cy3 and a subset can be labeled with Cy5. A ratio of signaldetected from a single quality control probe for each type of fluor usedcan be determined. By varying the horizontal and vertical placement ofthe quality control probes on the microarray, a range of acceptableratios can be determined. Deviation from this empirically determinedratio indicates a microarray defect.

For microarrays using quality control probes without predeterminedbinding sequences and synthesized with labeled monomers, similar methodscan be used to detect defects. Instead of the signal originating fromthe labeled binding partner of the predetermined binding sequence, itwill come from the quality control probe itself that is attached to themicroarray. Ratios and standard deviations from the mean signal can beused to assess integrity of the microarray.

For microarrays using quality control probes without predeterminedbinding sequences or labeled monomers, similar methods can be used todetect quality defects. In these microarrays, however, the detectablesignal is proportional to the length of the quality control probe; thus,signal intensities should not be similar for each quality control probeof a differing lengths. Rather, a more intense signal is expected fromlonger quality control probes. Deviation from the differences expectedto be seen between probes indicates a defect in the microarray.

In one embodiment, when mixtures of quality control probes are used,expected binding ratios or signal intensities can be determinedempirically. Microarrays that are known to contain no defects can beused to get baseline values for predetermined binding sequence bindingto its binding partner or signal intensities for each of the differenttypes of quality control probes. Ratios can be determined from this dataand used as the expected ratios. Deviation from these ratios indicates adefective microarray.

5.5 Microarray Synthesis and Processing

The probes on microarrays can be any one of a number of differentbiopolymers, e.g., DNAs, RNAs, peptide nucleic acids (PNAs) (see e.g.,Eghohm et al., 1993, Nature 363:566-568; U.S. Pat. No. 5,539,083), orproteins. The microarrays of the invention are synthesized by astep-by-step addition of monomers onto a solid support. Each suchmonomer is a unit of biopolymer that is added during one synthesiscycle. In one embodiment, the unit of biopolymer added per synthesiscycle is itself composed of not more than one basic biopolymer unit(e.g., a nucleotide, amino acid, etc.). In another embodiment, the unitof biopolymer added per synthesis cycle consists of more than one basicbiopolymer unit (e.g., a dinucleotide, a dipeptide, a nucleotide oramino acid covalently linked to another moiety, etc.). In anotherembodiment, the unit of biopolymer added per synthesis cycle varies withdifferent synthesis cycles.

5.5.1 Nucleotide Microarrays

In a preferred embodiment in the present invention, sample processing isthrough hybridization on a nucleotide microarray. In a more preferredembodiment, the microarray is an oligonucleotide array. In a mostpreferred embodiment, the oligonucleotide array is an inkjet-synthesized oligonucleotide microarray. Preferably, the microarraycontains in the range of 20 to 50,000 nucleic acid probes. The probescan be arranged in a variety of patterns. For example, the probes can bearranged in rows and columns, polygonal (e.g., hexagonal), or circularpatterns, etc.

Hybridization levels are preferably measured using polynucleotide probearrays or microarrays. On a polynucleotide array, polynucleotide probescomprising sequences of interest are immobilized to the surface of asupport, e-g., a solid support. For example, the probes may comprise DNAsequences, RNA sequences, or copolymer sequences of DNA and RNA. Thepolynucleotide sequences of the probes may also comprise DNA and/or RNAanalogues (e.g., peptide nucleic acids), or combinations thereof Forexample, the polynucleotide sequences of the probe may be fill orpartial sequences of genomic DNA or mRNA derived from cells, or may becDNA or cRNA sequences derived therefrom.

The probe or probes used in the methods of the invention are preferablyimmobilized to a solid support or surface which may be either porous ornon-porous. For example, the probes of the invention may bepolynucleotide sequences which are attached to a nitrocellulose or nylonmembrane or filter. Such hybridization probes are well known in the art(see, e.g., Sambrook et al., Eds., 1989, Molecular Cloning: A LaboratoryManual, Vols. 1-3, 2nd ed., Cold Spring Harbor Laboratory, Cold SpringHarbor, N.Y.). Alternatively, the solid support or surface may be aglass or plastic surface.

5.5.1.1 Hybridization Assay using Microarrays

A microarray is an array of positionally-addressable binding (e.g.,hybridization) sites on a support. Each of such binding sites comprisesa plurality of polynucleotide molecules of a probe bound to thepredetermined region on the support. Microarrays can be made in a numberof ways, of which several are described herein below (see e.g., Meltzer,2001, Curr. Opin. Genet. Dev. 11(3):258-63; Andrews et al., 2000, GenomeRes. 10(12):2030-43; Abdellatif, 2000, Circ. Res. 86(9):919-20; Lennon,2000, Drug Discov. Today 5(2):59-66; Zweiger, 1999, Trends Biotechnol.17(11):429-36). However produced, microarrays share certaincharacteristics. The arrays are preferably reproducible, allowingmultiple copies of a given array to be produced and easily compared witheach other. Preferably, the microarrays are made from materials that arestable under binding (e.g., nucleic acid hybridization) conditions. Themicroarrays are preferably between 1 cm² and 25 cm², preferably about 10cm² to 15 cm². However, both larger and smaller (e.g., 0.5 cm² or less)arrays are also contemplated and may be preferable, e.g., forsimultaneously evaluating a very large number of different probes.

In a particularly preferred embodiment, hybridization levels aremeasured to microarrays of probes consisting of a solid phase on thesurface of which are immobilized a population of polynucleotides, suchas a population of DNA or DNA mimics or, alternatively, a population ofRNA or RNA mimics. The solid phase may be a nonporous or, optionally, aporous material such as a gel. Microarrays can be employed, e.g., foranalyzing the transcriptional state of a cell such as thetranscriptional states of cells exposed to graded levels of a drug ofinterest or to graded perturbations to a biological pathway of interestMicroarrays can be used to simultaneously screen a plurality ofdifferent probes to evaluate, e.g., each probe's sensitivity andspecificity for a particular target polynucleotide.

Preferably, a given binding site or unique set of binding sites on themicroarray will specifically bind (e.g., hybridize) to the product of asingle gene or gene transcript from a cell or organism (e.g., to aspecific mRNA or to a specific cDNA derived therefrom). However, ingeneral, other related or similar sequences may cross hybridize to agiven binding site.

The microarrays used in the methods and compositions of the presentinvention include one or more test probes, each of which has apolynucleotide sequence that is complementary to a subsequence of RNA orDNA to be detected. Each probe preferably has a different nucleic acidsequence, and the position of each probe on the solid surface of thearray is preferably known. Indeed, the microarrays are preferablyaddressable arrays, more preferably positionally addressable arrays.More specifically, each probe of the array is preferably located at aknown, predetermined position on the solid support such that theidentity (i.e., the sequence) of each probe can be determined from itsposition on the array (i.e., on the support or surface).

Preferably, the density of probes on a microarray is about 100 different(i.e., non-identical) probes per 1 cm² or higher. More preferably, amicroarray used in the methods of the invention will have at least 550probes per 1 cm², at least 1000 probes per 1 cm², at least 1500 probesper 1 cm² or at least 2000 probes per 1 cm². In a particularly preferredembodiment, the microarray is a high density array, preferably having adensity of at least about 2500 different probes per 1 cm². Themicroarrays used in the invention therefore preferably contain at least2500, at least 5000, at least 10000, at least 15000, at least 20000, atleast 25000, at least 50000 or at least 55000 different (i.e.,non-identical) probes. A subset of these probes will correspond tospike-in tags which may have been added to the sample.

Such polynucleotides are preferably of the length of 15 to 200 bases,more preferably of the length of 20 to 100 bases, most preferably 40-60bases. It will be understood that each probe sequence may also comprisea linker (e.g., spacer) in addition to the sequence that iscomplementary to its target sequence. As used herein, a linker refers toa chemical structure between the sequence that is complementary to itstarget sequence and the surface. The linker need not be a nucleotidesequence. For example, the linker can be composed of a nucleotidesequence, or peptide nucleic acids, hydrocarbon chains, etc.

In one embodiment, the microarray is an array (i.e., a matrix) in whicheach position represents a discrete binding site for a transcriptencoded by a gene (e.g., for an mRNA or a cDNA derived therefrom). Forexample, in various embodiments, the microarrays of the invention cancomprise binding sites for products encoded by fewer than 50% of thegenes in the genome of an organism. Alternatively, the microarrays ofthe invention can have binding sites for the products encoded by atleast 50%/o, at least 75%, at least 85%, at least 90%, at least 95%, atleast 99% or 100%, or at least 50, 100, 500, 1000, or 10000 of the genesin the genome of an organism. In other embodiments, the microarrays ofthe invention can having binding sites for products encoded by fewerthan 50%, by at least 50%, by at least 75%, by at least 85%, by at least90%, by at least 95%, by at least 99% or by 100% of the genes expressedby a cell of an organism. The binding site can be a DNA or DNA analog towhich a particular RNA can specifically hybridize. The DNA or DNA analogcan be, e.g., a synthetic oligomer or a gene fragment, e.g.corresponding to an exon.

Preferably, the microarrays used in the invention have binding sites(ie., probes) for sets of genes for one or more genes relevant to theaction of a drug of interest or in a biological pathway of interest. Asdiscussed above, a “gene” is identified as a portion of DNA that istrrnscribed by RNA polymerase, which may include a 5′ untrrnslatedregion (UTR), introns, exons and a 3′ UTR The number of genes in agenome can be estimated from the number of MRNA molecules expressed bythe cell or organism, or by extrapolation of a well characterizedportion of the genome. When the genome of the organism of interest hasbeen sequenced, the number of open reading frames (ORFs) can bedetermined and mRNA coding regions identified by analysis of the DNAsequence. For example, the genome of Saccharomyces cerevisiae has beencompletely sequenced and is reported to have approximately 6275 ORFsencoding sequences longer the 99 amino acid residues in length Analysisof these ORFs indicates that there are 5,885 ORFs that are likely toencode protein products (Goffeau et al., 1996, Science 274:546-567). Incontrast, the human genome is estimated to contain approximately 30000to 130000 genes (see Crollius et al., 2000, Nature Genetics 25:235-238;Ewing et al., 2000, Nature Genetics 25:232-234). Genome sequences forother organisms, including but not limited to Drosophila, C. elegans,plants, e.g., rice and Arabidopsis, and mammals, e.g., mouse and human,are also completed or nearly completed. Thus, in preferred embodimentsof the invention, array set comprising probes for all genes in thegenome of an organism is provided.

It will be appreciated that when a sample of target nucleic acidmolecules, e.g., cDNA complementary to the RNA of a cell is made andhybridized to a microarray under suitable hybridization conditions, thelevel of hybridization to the site in the array will reflect theprevalence of the corresponding complementary sequences in the sample.For example, when detectably labeled (e.g., with a fluorophore) cDNA ishybridized to a microarray, the site on the array corresponding to anucleotide sequence that is not in the sample will have little or nosignal (e.g., fluorescent signal), and a nucleotide sequence that isprevalent in the sample will have a relatively strong signal. Therelative abundance of different nucleotide sequences in a sample may bedetermined by the signal strength pattern of probes on a microarray.

Nucleic acids from samples from two different cells subjected to twodifferent conditions can be hybridized to the binding sites of themicroarray using a two-color protocol. In the case of drug responses,one cell sample is exposed to a drug and another cell sample of the sametype is not exposed to the drug. The cDNA derived from each of the twocell types is differently labeled (e.g., with Cy3 and Cy5) so that theycan be distinguished. In one embodiment, for example, cDNA from a celltreated with a drug (or having a mutation or a disease, etc.) issynthesized using a fluorescein-labeled dNTP, and cDNA from a secondcell, not drug-exposed, is synthesized using a rhodamine-labeled dNTP.When the two cDNA molecules are mixed and hybridized to the microarray,the relative intensity of signal from each cDNA set is determined foreach site on the array, and any relative difference in abundance of aparticular gene detected.

In the example described above, the nucleic acid from the drug-treatedcell will fluoresce green when the fluorophore is stimulated and thenucleic acid from the untreated cell will fluoresce red. As a result,when the drug treatment has no effect, either directly or indirectly, onthe transcription of a particular gene in a cell, the expressionpatterns will be indistinguishable in both cells and, upon reversetranscription, red-labeled and green-labeled nucleic acids will beequally prevalent. When hybridized to the microarray, the bindingsite(s) for that species of nucleic acid will emit wavelengthscharacteristic of both fluorophores. In contrast, when the drug-exposedcell is treated with a drug that, directly or indirectly, change thetranscription of a particular gene in the cell, the expression patternas represented by ratio of green to red fluorescence for each bindingsite will change. When the drug increases the prevalence of an mRNA, theratios for each binding site of the mRNA will increase, whereas when thedrug decreases the prevalence of an mRNA, the ratio for each for eachbinding site in the mRNA will decrease.

The use of a two-color fluorescence labeling and detection scheme todefine alterations in gene expression has been described in connectionwith detection of mRNA molecules, e.g., in Shena et al., 1995,Quantitative monitoring of gene expression patterns with a complementaryDNA microarray, Science 270:467-470. An advantage of using cDNA labeledwith two different fluorophores is that a direct and internallycontrolled comparison of the mRNA or exon expression levelscorresponding to each arrayed gene in two cell states can be made, andvariations due to minor differences in experimental conditions (e.g.,hybridization conditions) will not affect subsequent analyses. However,it will be recognized that it is also possible to use cDNA from a singlecell, and compare, for example, the absolute amount of a particular exonin, e.g., a drug-treated or pathway-perturbed cell and an untreatedcell. Furthermore, labeling with more than two colors is alsocontemplated in the present invention. In some embodiments of theinvention, at least 5, 10, 20, or 100 dyes of different colors can beused for labeling. Such labeling permits simultaneous hybridizing of thedistinguishably labeled cDNA populations to the same array, and thusmeasuring, and optionally comparing the expression levels of, mRNAmolecules derived from more than two samples. Dyes that can be usedinclude, but are not limited to, fluorescein and its derivatives,rhodamine and its derivatives, texas red, 5′carboxy-fluorescein (FMA),2′,7′-dimethoxy4′,5′-dichloro-6-carboxy-fluorescein (JOE),N,N,N′,N′-tetramethyl-6-carboxy-rhodamine (TAMRA), 6′carboxy-X-rhodamineBOX), HEX, TET, IRD40, and IRD41, cyamine dyes, including but are notlimited to Cy3, Cy3.5 and Cy5; BODIPY dyes including but are not limitedto BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650, and BODIPY-650/670;and ALEXA dyes, including but are not limited to ALEXA-488, ALEXA-532,ALEXA-546, ALEXA-568, and ALEXA-594; as well as other fluorescent dyeswhich will be known to those who are skilled in the art.

5.5.1.2 Preparing Probes for Microarrays

As noted above, the probe to which a particular polynucleotide moleculespecifically hybridizes is a complementary polynucleotide sequence.Typically each probe on the microarray will be between 20 bases and 600bases, and usually between 30 and 200 bases in length.

The means for generating the polynucleotide probes of the microarray isby synthesis of synthetic polynucleotides or oligonucleotides, e.g.,using N-phosphonate or phosphoramidite chemistries (Froehler et al.,1986, Nucleic Acid Res. 14:5399-5407; McBride et al., 1983, TetrahedronLett. 24:246-248). Synthetic sequences are typically between about 15and about 600 bases in length, more typically between about 20 and about100 bases, most preferably between about 40 and about 70 bases inlength.

The probes on the microarrays are macromolecules attached to the solidsupport of a microarray. In the present invention, the probes arepreferably nucleic acid sequences (or fragments thereof).

5.5.1.3 Attaching Probes to the Solid Surface

Methods of the invention utilize polynucleotide probes synthesizeddirectly on the support to form the array. The probes are attached to asolid support or surface, which may be made, e.g., from glass, plastic(e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, orother porous or nonporous material.

A method for making microarrays is by making high-densityoligonucleotide arrays. There are a variety of techniques known forproducing arrays containing thousands of oligonucleotides complementaryto defined sequences, at defined locations on a surface. For example,photolithographic techniques for synthesis in situ (see, Fodor et al.,1991, Science 251:767-773; Pease et al., 1994, Proc. Natl. Acad. Sci.U.S.A. 91:5022-5026; Lockhart et al., 1996, Nature BioTechnology14:1675; U.S. Pat. Nos. 5,489,678; 5,578,832; 5,556,752; 5,510,270;6,197,506; and 6,346,413) or other methods for rapid synthesis anddeposition of defined oligonucleotides (Blanchard et al., Biosensors &Bioelectronics 11:687-690) may be used.

Other methods for making microarrays, e.g., by masking (Maskos andSouthern, 1992, Nucl. Acids. Res. 20:1679-1684), may also be used. Inprinciple, and as noted supra, any type of array, for example, dot blotson a nylon hybridization membrane (see Sambrook et al., supra) could beused. However, as will be recognized by those skilled in the art, verysmall arrays will frequently be preferred because hybridization volumeswill be smaller.

In a particularly preferred embodiment, microarrays of the invention aremanufactured by means of an ink jet printing device for oligonucleotidesynthesis, e.g., using the methods and systems described by Blanchard inInternational Patent Publication No. WO 98/41531, published Sep. 24,1998; Blanchard et al., 1996, Biosensors and Bioelectronics 11:687-690;Blanchard, 1998, in Synthetic DNA Arrays in Genetic Engineering, Vol.20, J. K. Setlow, Ed, Plenum Press, New York at pages 111-123; Hughes etal., 2001, Nature BioTechnology 19:342-347; and U.S. Pat. No. 6,028,189to Blanchard. Specifically, the oligonucleotide probes in suchmicroarrays are preferably synthesized in arrays, e.g., on a glassslide, by serially depositing individual nucleotide bases inmicrodroplets of a high surface tension solvent such as propylenecarbonate. The microdroplets have small volumes (e.g., 100 pL or less,more preferably 50 pL or less) and are separated from each other on themicroarray (e.g., by hydrophobic domains) to form circular surfacetension wells which define the locations of the array elements (i.e.,the different probes). Polynucleotide probes are attached to the surfacecovalently at the 3′ end of the polynucleotide.

When these methods are used, oligonucleotides (e.g., 60-mers) of knownsequence are synthesized directly on a surface such as a derivatizedglass slide. The array produced can be redundant, with severaloligonucleotide molecules per gene.

5.5.1.4 Target Polynucleotide Molecules

Target polynucleotides are the polynucleotides of the biological samplesthat are being processed on the microarray. Target polynucleotides canbe RNA molecules such as, but by no means limited to messenger RNA(mRNA) molecules, ribosomal RNA (rRNA) molecules, cRNA molecules (i.e.,RNA molecules prepared from cDNA molecules that are transcribed invitro) and fragments thereof. Additionally, target polynucleotides mayalso be, but are not limited to, DNA molecules such as genomic DNAmolecules, cDNA molecules, and fragments thereof includingoligonucleotides, ESTs, STSs, etc. In specific embodiments, the samplecomprises more than 1000, 5000, 10000, 50000, 100000, 250000, or 1000000nucleic acid molecules of different nucleotide sequences.

The target polynucleotides may be from any source. For example, thetarget polynucleotide molecules may be naturally occurring nucleic acidmolecules such as genomic or extragenomic DNA molecules isolated from anorganism, or RNA molecules, such as mRNA molecules, isolated from anorganism. Alternatively, the polynucleotide molecules may besynthesized, including, e.g., nucleic acid molecules synthesizedenzymatically in vivo or in vitro, such as cDNA molecules, orpolynucleotide molecules synthesized by PCR, RNA molecules synthesizedby in vitro transcription, etc. The sample of target polynucleotides cancomprise, e.g., molecules of DNA, RNA, or copolymers of DNA and RNA. Inpreferred embodiments, the target polynucleotides of the invention willcorrespond to particular genes or to particular gene transcripts (e.g.,to particular mRNA sequences expressed in cells or to particular cDNAsequences derived from such mRNA sequences). However, in manyembodiments, particularly those embodiments wherein the polynucleotidemolecules are derived from mammalian cells, the target polynucleotidesmay correspond to particular fragments of a gene transcript. Forexample, the target polynucleotides may correspond to different exons ofthe same gene, e.g., so that different splice variants of that gene maybe detected and/or analyzed.

In preferred embodiments, the target polynucleotides to be analyzed areprepared in vitro from nucleic acids extracted from cells. For example,in one embodiment, RNA is extracted from cells (e.g., total cellularRNA, poly(A)+ messenger RNA, fraction thereof) and messenger RNA ispurified from the total extracted RNA Methods for preparing total andpoly(A)+ RNA are well known in the art, and are described generally,e.g. in Sambrook et al., supra. In one embodiment, RNA is extracted fromcells of the various types of interest in this invention usingguanidinium thiocyanate lysis followed by CsCl centrifugation and anoligo dT purification (Chirgwin et al., 1979, Biochemistry18:5294-5299). In another embodiment, total RNA is extracted from cellsusing guanidinium thiocyanate lysis followed by purification on RNeasycolumns (Qiagen). cDNA is then synthesized from the purified mRNA using,e.g., oligo-dT or random primers. In preferred embodiments, the targetpolynucleotides are cRNA prepared from cDNA prepared from purified mRNAor from total RNA extracted from cells. As used herein, cRNA can eitherbe complementary to (anti-sense) or of the same sequence (sense) as thesample RNA. The extracted RNA molecules are amplified using a process inwhich double-stranded cDNA molecules are synthesized from the sample RNAmolecules using primers linked to an RNA polymerase promoter. As aresult, RNA polymerase promoters can be incorporated into either or bothstrands of the cDNA. Using the RNA polymerase promoter that is on thefirst strand of the cDNA molecule, cRNA can be synthesized that is thesame sequence as the sample RNA. To synthesize cRNA complementary to thesample RNA, transcription can be initiated from the RNA polymerasepromoter that is on the second strand of the double-stranded cDNAmolecule using an RNA polymerase (see, e.g., U.S. Pat. Nos. 5,891,636,5,716,785; 5,545,522 and 6,132,997; see also, U.S. Pat. No. 6,271,002and U.S. Provisional Patent Application Ser. No. 60/253,641, filed onNov. 28, 2000, by Ziman et al.). Both oligo-dT primers (U.S. Pat. Nos.5,545,522 and 6,132,997) or random primers (U.S. Provisional PatentApplication Ser. No. 60/253,641, filed on Nov. 28, 2000, by Ziman etal.) that contain an RNA polymerase promoter or complement thereof canbe used. Preferably, the target polynucleotides are short and/orfragmented polynucleotide molecules which are representative of theoriginal nucleic acid population of the cell. In one embodiment, totalRNA is used as input for cRNA synthesis. An oligo-dT primer containing aT7 RNA polymerase promoter sequence can be used to prime first strandcDNA synthesis. When second strand synthesis is desired, random hexamerscan be used to prime second strand cDNA synthesis by a reversetranscriptase. This reaction yields a double-stranded cDNA that containsthe T7 RNA polymerase promoter at the 3′ end. The double-stranded cDNAcan then be transcribed into cRNA by T7 RNA polymerase.

The target polynucleotides to be analyzed are preferably detectablylabeled. For example, cDNA can be labeled directly, e.g., withnucleotide analogs, or indirectly, e.g., by making a second, labeledcDNA strand using the first strand as a template. Alternatively, thedouble-stranded cDNA can be transcribed into cRNA and labeled.

Preferably, the detectable label is a fluorescent label, e.g., byincorporation of nucleotide analogs. Other labels suitable for use inthe present invention include, but are not limited to, biotin,imminobiotin, antigens, cofactors, dinitrophenol, lipoic acid, olefiniccompounds, detectable polypeptides, electron rich molecules, enzymescapable of generating a detectable signal by action upon a substrate,and radioactive isotopes. Preferred radioactive isotopes include ³²P,³⁵S, ¹⁴C, ¹⁵N and ¹²⁵I. Fluorescent molecules suitable for the presentinvention include, but are not limited to, fluorescein and itsderivatives, rhodamine and its derivatives, texas red,5′carboxy-fluorescein (FMA),2′,7′-dimethoxy-4′,5′-dichloro-6-carboxy-fluorescein (JOE),N,N,N′,N′-tetramethyl-6-carboxy-rhodamine (TAMRA), 6′carboxy-X-rhodamine(ROX), HEX, TET, IRD40, and IRD41. Fluorescent molecules that aresuitable for the invention further include: cyamine dyes, including bynot limited to Cy3, Cy3.5 and Cy5; BODIPY dyes including but not limitedto BODIPY-FL, BODIPY-TR, BODIPY-TMR, BODIPY-630/650, and BODIPY-650/670;and ALEXA dyes, including but not limited to ALEXA-488, ALEXA-532,ALEXA-546, ALEXA-568, and ALEXA-594; as well as other fluorescent dyeswhich will be known to those who are skilled in the art. Electron richindicator molecules suitable for the present invention include, but arenot limited to, ferritin, hemocyanin, and colloidal gold. Alternatively,in less preferred embodiments the target polynucleotides may be labeledby specifically complexing a first group to the polynucleotide. A secondgroup, covalently linked to an indicator molecules and which has anaffinity for the first group, can be used to indirectly detect thetarget polynucleotide. In such an embodiment, compounds suitable for useas a first group include, but are not limited to, biotin andimminobiotin. Compounds suitable for use as a second group include, butare not limited to, avidin and streptavidin.

The binding partners of the predetermined binding sequence of thequality control probes can be added to the target molecules prior tocontact with the microarray. In one embodiment, the binding partners areadded to the target molecules during target molecule processing. In amore preferred embodiment, the binding partners are added tot he targetmolecules immediately prior to contacting the microarray.

5.5.1.5 Hybridization to Microarrays

As described supra, nucleic acid hybridization and wash conditions arechosen so that the polynucleotide molecules to be analyzed (or targetpolynucleotide molecules) specifically bind or specifically hybridize tothe complementary polynucleotide sequences of the array, preferably toone or more specific array sites, wherein its complementary sequence islocated.

Arrays containing double-stranded probe DNA situated thereon arepreferably subjected to denaturing conditions to render the DNAsingle-stranded prior to contacting with the target polynucleotidemolecules. Arrays containing single-stranded probe DNA (e.g., syntheticoligodeoxyribonucleic acids) may need to be denatured prior tocontacting with the target polynucleotide molecules, e.g., to removehairpins or dimers which form due to self complementary sequences.

Optimal hybridization conditions will depend on the length (e.g.,oligomer versus polynucleotide greater than 200 bases) and type (e.g.,RNA, or DNA) of probe and target nucleic acids. General parameters forspecific (i.e., stringent) hybridization conditions for nucleic acidsare described in Sambrook et al., (supra), and in Ausubel et al., 1987,Current Protocols in Molecular Biology, Greene Publishing andWiley-Interscience, New York. For example, when cDNA microarrays areused, typical hybridization conditions are hybridization in 5×SSC plus0.2% SDS at 65° C. for four hours, followed by washes at 25° C. in lowstringency wash buffer (1×SSC plus 0.2% SDS), followed by 10 minutes at25° C. in higher stringency wash buffer (0.1×SSC plus 0.2% SDS) (Hugheset al., 2001, Nature BioTechnology 19:342-347). Useful hybridizationconditions are also provided in, e.g., Tijessen, 1993, HybridizationWith Nucleic Acid Probes, Elsevier Science Publishers B. V. and Kricka,1992, Nonisotopic DNA Probe Techniques, Academic Press, San Diego,Calif.

Particularly preferred hybridization conditions for use with thescreening and/or signaling chips of the present invention includehybridization at a temperature at or near the mean melting temperatureof the probes (e.g., within 5° C., more preferably within 2° C.) in 1MNaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium Sarcosine and 30%formamide.

5.5.1.6 Signal Detection and Data Analysis

It will be appreciated that when target sequences, e.g., cDNA or cRNA,complementary to the RNA of a cell is made and hybridized to amicroarray under suitable hybridization conditions, the level ofhybridization to the site in the array corresponding to a particulargene will reflect the prevalence in the cell of mRNA or mRNA moleculescontaining the transcript from that gene. For example, when detectablylabeled (e.g., with a fluorophore) cDNA complementary to the totalcellular mRNA is hybridized to a microarray, the site on the arraycorresponding to a gene (Le., capable of specifically binding theproduct or products of the gene expressing) that is not transcribed inthe cell will have little or no signal (e.g., fluorescent signal), and agene for which the encoded mRNA expressing the transcript is prevalentwill have a relatively strong signal.

In preferred embodiments, target sequences, e.g., cDNA molecules or cRNAmolecules, from two different cells are hybridized to the binding sitesof the microarray. In the case of drug responses one cell sample isexposed to a drug and another cell sample of the same type is notexposed to the drug. In the case of pathway responses one cell isexposed to a pathway perturbation and another cell of the same type isnot exposed to the pathway perturbation. The cDNA or cRNA derived fromeach of the two cell types are differently labeled so that they can bedistinguished. In one embodiment, for example, cDNA from a cell treatedwith a drug (or otherwise perturbed) is synthesized using afluorescein-labeled dNTP, and cDNA from a second cell, not drug-exposed,is synthesized using a rhodamine-labeled dNTP. When the two cDNAmolecules are mixed and hybridized to the microarray, the relativeintensity of signal from each cDNA set is determined for each site onthe array, and any relative difference in abundance of a particulartranscript detected.

In the example described above in the previous paragraph, the cDNA fromthe drug-treated (or otherwise perturbed) cell will fluoresce green whenthe fluorophore is stimulated and the cDNA from the untreated cell willfluoresce red. As a result, when the drug treatment has no effect,either directly or indirectly, on the transcription of a particular genein a cell, the expression pattern will be indistinguishable in bothcells and, upon reverse transcription, red-labeled and green-labeledcDNA will be equally prevalent When hybridized to the microarray, thebinding site(s) for that species of RNA will emit wavelengthscharacteristic of both fluorophores. In contrast, when the drug-exposedcell is treated with a drug that, directly or indirectly, changes thetranscription splicing of a particular gene in the cell, the expressionpattern as represented by ratio of green to red fluorescence for eachtranscript binding site will change. When the drug increases theprevalence of an mRNA, the ratios for each transcript fragment expressedin the mRNA will increase, whereas when the drug decreases theprevalence of an mRNA, the ratio for each exons expressed in the mRNAwill decrease.

The use of a two-color fluorescence labeling and detection scheme todefine alterations in gene expression has been described in connectionwith detection of mRNA molecules, e.g., in Shena et al., 1995,Quantitative monitoring of gene expression patterns with a complementaryDNA microarray, Science 270:467-470. An advantage of using targetsequences, e.g., cDNA molecules or cRNA molecules, labeled with twodifferent fluorophores is that a direct and internally controlledcomparison of the mRNA expression levels corresponding to each arrayedgene in two cell states can be made, and variations due to minordifferences in experimental conditions (e.g. hybridization conditions)will not affect subsequent analyses. However, it will be recognized thatit is also possible to use cDNA from a single cell and compare, forexample, the absolute amount of a particular exon in, e.g., adrug-treated or otherwise perturbed cell and an untreated cell.

In other preferred embodiments, single channel detection methods, e.g.using one-color fluorescence labeling, are used (see U.S. patentapplication Ser. No. 09/781,814, filed on Feb. 12, 2001). In thisembodiment, arrays comprising reverse-complement (RC) probes aredesigned and produced. Because a reverse complement of a DNA sequencehas sequence complexity that is equivalent to the correspondingforward-strand (FS) probe that is complementary to a target sequencewith respect to a variety of measures (e.g., measures such as GC contentand GC trend are invariant under the reverse complement), a RC probe isused to as a control probe for determination of level of non-specificcross hybridization to the corresponding FS probe. The significance ofthe FS probe intensity of a target sequence is determined by comparingthe raw intensity measurement for the FS probe and the corresponding rawintensity measurement for the RC probe in conjunction with therespective measurement errors. In a preferred embodiment, a transcriptis called present if the intensity difference between the FS probe andthe corresponding RC probe is significant. More preferably, a transcriptis called present if the FS probe intensity is also significantly abovebackground level. Single channel detection methods can be used inconjunction with multi-color labeling. In one embodiment, a plurality ofdifferent samples, each labeled with a different color, is hybridized toan array. Differences between FS and RC probes for each color are usedto determine the level of hybridization of the corresponding sample.

When fluorescently labeled probes are used, the fluorescence emissionsat each site of a transcript array can be, preferably, detected byscanning confocal laser microscopy. In one embodiment, a separate scan,using the appropriate excitation line, is carried out for each of thetwo fluorophores used. Alternatively, a laser can be used that allowssimultaneous specimen illumination at wavelengths specific to the twofluorophores and emissions from the two fluorophores can be analyzedsimultaneously (see Shalon et al., 1996, Genome Res. 6:639-645). In apreferred embodiment, the arrays are scanned with a laser fluorescencescanner with a computer controlled X-Y stage and a microscope objective.Sequential excitation of the two fluorophores is achieved with amulti-line, mixed gas laser, and the emitted light is split bywavelength and detected with two photomultiplier tubes. Suchfluorescence laser scanning devices are described, e.g., in Schena etal., 1996, Genome Res. 6:639-645. Alternatively, the fiber-optic bundledescribed by Ferguson et al., 1996, Nature BioTechnology 14:1681-1684,may be used to monitor mRNA abundance levels at a large number of sitessimultaneously.

Signals are recorded and, in a preferred embodiment, analyzed bycomputer, e.g., using a 12 bit or 16 bit analog to digital board. In oneembodiment, the scanned image is despeckled using a graphics program(e.g., Hijaak Graphics Suite) and then analyzed using an image griddingprogram that creates a spreadsheet of the average hybridization at eachwavelength at each site. If necessary, an experimentally determinedcorrection for cross talk (or overlap) between the channels for the twofluors may be made. For any particular hybridization site on thetranscript array, a ratio of the emission of the two fluorophores can becalculated. The ratio is independent of the absolute expression level ofthe cognate gene, but is useful for genes whose expression issignificantly modulated by drug administration, gene deletion, or anyother tested event.

The relative abundance of an mRNA in two cells or cell lines ispreferably scored as perturbed (i.e., the abundance is different in thetwo sources of mRNA tested) or as not perturbed (i.e., the relativeabundance is the same). As used herein, a difference between the twosources of RNA of at least a factor of about 25% (ie., RNA is 25% moreabundant in one source than in the other source), more usually about50%, even more often by a factor of about 2 (i.e., twice as abundant), 3(three times as abundant), or 5 (five times as abundant) is preferablyscored as a perturbation.

It is, however, also advantageous to determine the magnitude of therelative difference in abundances for an mRNA expressed in an mRNA intwo cells or in two cell lines. This can be carried out, as noted above,by calculating the ratio of the emission of the two fluorophores usedfor differential labeling, or by analogous methods that will be readilyapparent to those of skill in the art

5.5.2 Protein Microarrays

In an embodiment-in-the-present invention, the microarray is a proteinmicroarray. As a result, the quality control probe in this embodiment isa polypeptide or peptide. Protein quality control probes preferably havea corresponding binding partner available such that contacting the probewith said binding partner can allow for specific and quantifiablebinding.

On a protein microarray, protein probes possessing the ability to bindproteins of interest are immobilized to the surface of a substrate,e.g., a solid support (see e.g., Goffeau et al., 1996, Science274:546-567; Aebersold et al., 1999, Nature BioTechnology 10:994-999;Haab et al., 2001, Genome Biology 2:RESEARCH0004.1-RESEARCH0004.13; U.S.Pat. No. 6,346,413). For example, polypeptide probes may be preparedusing standard solid-phase techniques for the synthesis of peptides. Asis generally known, polypeptides can be prepared using commerciallyavailable equipment and reagents following the manufacturers'instructions for blocking interfering groups, protecting the amino acidto be reacted, coupling, deprotection, and capping of unreactedresidues. The protein probes may contain non-peptide linkages and/ormodified or non-naturally occurring amino acids, e.g., D-amino acids,phosphorous analogs of amino acids, such as α-amino phosphoric acids andβ-amino phosphoric acids.

The probes used in the methods of the invention are preferablysynthesized on a solid support or surface which may be either porous ornon-porous. For example, the probes of the invention may be polypeptidesequences which are attached to a nitrocellulose or nylon membrane orfilter. Alternatively, the solid support or surface may be a glass orplastic surface.

Proteins can be synthesized on a positionally addressable array with aplurality of proteins attached to a substrate, with each protein beingat a different position on the solid support. Preferably, the pluralityof proteins comprises at least 10, 50, 100,250, 500, 1000, 1500, 2000,2500, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 50000, or 100000different polypeptides expressed in a single biological sample, plus thequality control probes. Protein probes are synthesized onto thesubstrate in a step-by-step synthesis using amino acid monomers.

In one embodiment, the quality control probe is an antibody or fragmentthereof In another embodiment, the binding partner of the qualitycontrol probe is an antibody or fragment thereof In a preferredembodiment, the antibody is a monoclonal antibody or fragment (e.g., Fabfragment) thereof (see, e.g., Zhu et al., 2001, Science 293:2101-2105;MacBeath et al., 2000, Science 289:1760-63; de Wildt et al., 2000,Nature BioTechnology 18:989-994).

It will be appreciated that when a sample of protein is bound to aprotein microarray under suitable conditions, the level of binding to aparticular site in the array will reflect the prevalence of thecorresponding binding partner in the sample. The level of bindingbetween polypeptide quality control probe on the microarray and itsprotein binding partner is preferably indicated by signaling compounds.For example, when a protein sample is bound to a protein microarray, thesite on the array corresponding to a polypeptide probe with acorresponding binding partner not in the sample will have little or nosignal, and a polypeptide probe with a corresponding binding partnerthat is prevalent in the sample will have a relatively strong signal.The relative abundance of different proteins in a sample may bedetermined by the signal strength pattern of probes on a microarray. Inone embodiment, one or more signal compounds (e.g., fluorescent dyes)are directly attached to the protein binding partner of the qualitycontrol probe. In another embodiment, one or more signal compounds areattached to the protein binding partner of the quality control probeindirectly (e.g., through the use of a fluorescently labeledantibodies).

5.6 Implementation Systems and Methods

The analytical methods of the present invention can preferably beimplemented using a computer system, such as the computer systemdescribed in this section, according to the following programs andmethods. Such a computer system can also preferably store and manipulatea database of the present invention which comprises a compendium ofpositional information pertaining to the location of quality controlprobes on the microarray as well as in which sequential cycles ofsynthesis they were synthesized (i.e., the vertical placement in themicroarray) and which can be used by a computer system in implementingthe analytical methods of this invention. Accordingly, such computersystems are also considered part of the present invention. In a specificembodiment, the quality control positional information is stored indigital form in a database.

In a specific embodiment, the computer system comprises one or moreprocessing units and one or more memory units connected to said one ormore processor units. Said one or more memory units contain one or moreprograms which cause said one or more processor units to execute stepsof comparing the binding to their binding partner of two or more of thequality control probes on an array of the invention. The result isoutput, preferably as a binding ratio of the quality control probes. Ina specific embodiment, the computer programs cause said one or moreprocessors to execute steps of

(a) receiving a first data structure comprising the binding intensity,of the quality control probes on the processed microarray, and

(b) comparing said first data structure to a plurality of datastructures in a database, each data structure comprising positionalinformation regarding the quality control probes associated with saidmicroarray, to identify the relevant positions on the said microarray tocompare to assess synthesis integrity, and

(c) comparing the binding of two or more quality control probes.

In a specific embodiment, the computer system comprises a program thatcauses the processor to compare the appropriate quality control probebinding intensities and thereby determine if the microarray wassynthesized correctly.

In another embodiment, the computer system performs one or more aspectsof the sample quality control. For example, the computer can read themicroarray's quality control probe intensities directly from the rawdata represented in a 1TIFF file of the scanned microarray image andcompare the appropriate intensities, and determine if the synthesis ofthe array is defective, thus resulting in suspect data. If a synthesisdefect is identified, the computer could generate a non-conformancereport and refrain from automatically adding the suspect data to thedatabase containing microarray possessing data until the quality controlissues are further addressed. In one embodiment, the computer wouldgenerate a non-conformance report if the binding ratio of the qualitycontrol probes is not between 0.5 and 2.0.

An exemplary computer system suitable for implementing the analyticmethods of this invention preferably comprises internal components beinglinked to external components. The internal components of this computersystem include a processor element interconnected with a main memory.For example, the computer system can be an Intel Pentium®-basedprocessor of 200 MHZ or greater clock rate and with 32 MB or more mainmemory. In a preferred embodiment, the computer system is a cluster of aplurality of computers comprising a head “node” and eight sibling“nodes”, with each node having a central processing unit (CPU). Inaddition, the cluster also comprises at least 128 MB of random accessmemory (RAM) on the head node and at least 256 MB of RAM on each of theeight sibling nodes. Therefore, the computer systems of the presentinvention are not limited to those consisting of a single memory unit ora single processor unit.

The external components can include a mass storage. This mass storagecan be one or more hard disks that are typically packaged together withthe processor and memory. Such hard disk are typically of 1 GB orgreater storage capacity and more preferably have at least 6 GB ofstorage capacity. For example, in a preferred embodiment, describedabove, wherein a computer system of the invention comprises severalnodes, each node can have its own hard drive. The head node preferablyhas a hard drive with at least 6 GB of storage capacity whereas eachsibling node preferably has a hard drive with at least 9 GB of storagecapacity. A computer system of the invention can further comprise othermass storage units including, for example, one or more floppy drives,one more CD-ROM drives, one or more DVD drives or one or more DATdrives.

Other external components typically include a user interface device,which is most typically a monitor and a keyboard together with agraphical input device such as a “mouse”. The computer system is alsotypically linked to a network link which can be, e.g., part of a localarea network (LAN) to other, local computer systems and/or part of awide area network (WAN), such as the Internet, that is connected toother, remote computer systems. For example, in the preferredembodiment, discussed above, wherein the computer system comprises aplurality of nodes, each node is preferably connected to a network,preferably an NFS network, so that the nodes of the computer systemcommunicate with each other and, optionally, with other computer systemsby means of the network and can thereby share data and processing taskswith one another.

Loaded into memory during operation of such a computer system areseveral software components. The software components comprise bothsoftware components that are standard in the art and components that arespecial to the present invention. These software components aretypically stored on mass storage such as the hard drive, but can bestored on other computer readable media as well including, for example,one or more floppy disks, one or more CD-ROMs, one or more DVDs or oneor more DATs. The software component represents an operating systemwhich is responsible for managing the computer system and its networkinterconnections. The operating system can be, for example, of theMicrosoft Windows™ family such as Windows 95, Window 98, Windows NT orWindows2000. Alternatively, the operating software can be a Macintoshoperating system, a UNIX operating system or the LNX operating system.The software components comprise common languages and functions that arepreferably present in the system to assist programs implementing methodsspecific to the present invention. Languages that can be used to programthe analytic methods of the invention include, for example, C and C++,FORTRAN, PERL, HTML, JAVA, and any of the UNIX or LINUX shell commandlanguages such as C shell script language. The methods of the inventioncan also be programmed or modeled in mathematical software packages thatallow symbolic entry of equations and high-level specification ofprocessing, including specific algorithms to be used, thereby freeing auser of the need to procedurally program individual equations andalgorithms. Such packages include, e.g., Matlab from Mathworks (Natick,Mass.), Mathematical from Wolfram Research (Champaign, Ill.) or S-Plusfrom MathSoft (Seattle, Wash.).

The software component comprises analytic methods of the presentinvention, preferably programmed in a procedural language or symbolicpackage. For example, the software component preferably includesprograms that cause the processor to implement steps of accepting aplurality of positional data for each quality control probe on eachmicroarray and storing the data in the memory. For example, the computersystem can accept data manually entered by a user (e.g., by means of theuser interface). Alternatively, however, the programs cause the computersystem to retrieve quality control probe information from a database.Such a database can be stored on a mass storage (e.g. a hard drive) orother computer readable medium and loaded into the memory of thecomputer, or the database can be accessed by the computer system bymeans of the network.

In one embodiment, the computer readable medium contains an encoded datastructure comprising:

(a) a digital representation of the position of the quality controlprobes on the microarray, and

(b) a digital representation of the cycles of synthesis at which eachquality control probe was synthesized.

In another embodiment, control microarrays with intentional defects canbe processed and signal intensity patterns and ratios can be stored. Thepresent invention also encompasses a process by which the signalintensity(ies) and/or resulting ratios from the sample microarray arecompared to the database containing a compendium of known errors. Shoulda match be found in the database, the defect in the sample microarraycan be determined.

In addition to the exemplary program structures and computer systemsdescribed herein, other, alternative program structures and computersystems will be readily apparent to the skilled artisan Such alternativesystems, which do not depart from the above described computer systemand programs structures either in spirit or in scope, are thereforeintended to be comprehended within the accompanying claims.

The following examples are presented by way of illustration of thepresent invention, and are not intended to limit the present inventionin any way.

6. EXAMPLE 1 Quality Control using Quality Control Probes

6.1 Demonstration of Synthesis Error

The inkjet writer uses two inkjet heads for distributingphosphoramidites or activator onto the glass substrate of the array.Each head contains three sets of 20 nozzles with each 20-nozzle setdedicated for depositing either a single phosphoramidite or theactivator. The 20 nozzles in a set are arranged in two interlacedcolumns of ten (see FIG. 1). This pattern allows for the deposition of20 rows of bases per pass of the inkjet heads, with each nozzle beingresponsible for a single row. Because each nozzle is responsible for aparticular row, any clog or other nozzle malfunction can result in allor a portion of rows being deleted or synthesized inefficiently(detected by a reduction of intensity in the affected quality controlprobes) with a 20 row periodicity. FIG. 1 shows a 25,000 oligonucleotidearray synthesized with three clogged nozzles (i.e., nozzles 4, 15, and20).

6.2 Synthesis Failure Detection

Silted quality control probes are depicted schematically in FIG. 2A. A25 nucleotide predetermined binding sequence (depicted by a solid line)is synthesized either directly on the microarray (so that the sequenceis made at synthesis cycles 1-25) or on spacers (depicted by a dashedline). The spacer are shown to be either 20 nucleotides long (so thatthe sequence is made at synthesis cycles 21-45) or 35 nucleotides long(so that the sequence is made at synthesis cycles 36-60). Should therebe no synthesis defects during oligonucleotide microarray synthesis,then the reverse complement of the predetermined binding sequence shouldhybridize equally well to the predetermined binding sequence on all ofthe quality control probes and give comparable signals.

FIG. 2B schematically depicts a synthesis defect in synthesis cycle 24of the oligonucleotide microarray (depicted by the striped bar). Becausethis affects the sequence of the predetermined binding sequence when itis either on no spacer or on a 20 nucleotide spacer, hybridization toits reverse complement will be decreased when compared to the level ofbinding that is observed with no synthesis error. The predeterminedbinding sequence on a 35 nucleotide spacer is unaffected; however, thusit should hybridize to its reverse complement to the same degree as whenno synthesis error was present.

A quality control probe having the sequence of SEQ ID NO:1 wassynthesized on an ink jet oligonucleotide microarray with either nospacer (with total length of 25 nucleotides), on a 20 oligonucleotidespacer (with total length of 45 nucleotides), or on a 35 oligonucleotidespacer (with total length of 60 nucleotides).

5′ ATCATCGTAGCTGGTCAGTGTATCC 3′ (SEQ ID NO:1)

The fluorescently labeled reverse complement of SEQ ID NO:1 was used tohybridize to the oligonucleotide microarray. When there were nosynthesis defects during oligonucleotide microarray synthesis, all ofthe quality control probes hybridized to their reverse complementequally well (FIG. 4). This was shown by the comparable levels ofhybridization to a fluorescently labeled reverse complementarynucleotide after microarray processing (see FIGS. 4A-4B). Dataquantifying fluorescent intensity for each quality control probe wasdone in duplicate on two microarrays and is given in Table 1. Ratios ofaverage fluorescent intensity of the 25mer to the average fluorescentintensity of the 45mer or 60mer approach 1 and indicates that all boundto their reverse complement comparably.

Similar experiments were conducted with various synthesis cycles beingdefective during microarray synthesis in order to ascertain thesensitivity of the quality control probes. When the first (FIG. 5) orfirst and second (FIG. 6) synthesis cycles were skipped duringsynthesis, only the 25mer hybridization to its complementaryfluorescently labeled oligonucleotide was affected (FIGS. 5A-5B and6A-6B). Both ratios in Table 1 show a decrease with respect to ratiosseen when no synthesis cycles are skipped. When the thirty sixth (FIG.7) or thirty fourth and thirty fifth (FIG. 8) synthesis cycles wereskipped, both of the 45mer and 60mer hybridization to theircomplementary fluorescently labeled oligonucleotides were affected(FIGS. 7A-7B and 8A-8B). Both ratios in Table 1 show an increase withrespect to ratios seen when no synthesis cycles are skipped. When therewas inefficient synthesis in the first twenty two synthesis cycles (FIG.9), only the 25mer hybridization to its complementary fluorescentlylabeled oligonucleotide was severely affected (FIG. 9A-9B). Both ratiosin Table 1 show a decrease with respect to ratios seen when no synthesiscycles are skipped or inefficient. TABLE 1 synthesis ratio of ratio ofcycles 25 mer/ 25 mer/ affected array 1 array 2 average 45 mer 60 merNone 25 mer 0.0628 0.0495 0.0562 1.28 1.03 45 mer 0.0413 0.0399 0.040660 mer 0.535  0.0555 0.0545  1 25 mer 0.0133 0.0149 0.0141 0.28 0.20 45mer 0.0461 0.0536 0.0499 60 mer 0.0656 0.0770 0.0713 1-2 25 mer 0.00560.0044 0.005  0.10 0.07 45 mer 0.0532 0.0442 0.0476 60 mer 0.0793 0.06750.0734 36 25 mer 0.0692 0.0730 0.0711 5.47 2.30 45 mer 0.0120 0.01400.013  60 mer 0.0278 0.0339 0.0309 34-35 25 mer 0.1028 0.0644 0.083642.0 1.87 45 mer 0.0020 0.0019 0.0195 60 mer 0.0471 0.0427 0.0449  1-2225 mer 0.0024 0.0020 0.0022 0.008  0.008 45 mer 0.2165 0.3682 0.2924 60mer 0.2640 0.2962 0.2801

7. EXAMPLE 2 Quality Control using Quality Control Probes with NoSpacers

7.1 Synthesis Failure Detection

Staggered start quality control probes are-depicted schematically inFIG. 3A-. A series of 25 nucleotide predetermined binding sequences(depicted by a bold line) are synthesized directly on the microarray,with the synthesis individual probe(s) starting at every synthesis cycle(from synthesis cycle 1-36). Unlike the above strategy, no spacers areused so that all of the quality control probes are made up exclusivelyof predetermined binding sequence that are 25 oligonucleotides long. Theonly difference between the quality control probes is the cycle at whichsynthesis begins (the bold line depicts the quality control probe andthe thin line depicts synthesis cycles that had no monomer deposited).The synthesis cycles that make up each quality control probe are listedabove each probe in FIG. 3A. Should there be no synthesis defects duringoligonucleotide microarray synthesis, then the reverse complement of theprobe sequence should hybridize equally well to all of the qualitycontrol probes and give comparable signals.

FIG. 3B schematically depicts a synthesis defect in synthesis cycle 29of the oligonucleotide microarray (depicted by the gray bar). Becausethis affects all of the predetermined binding sequences that havesynthesis cycle 29 as part of their sequence (i.e., those qualitycontrol probes that begin at synthesis cycles 5-29), hybridization ofthe reverse complement will be decreased in these quality control probeswhen compared to the level of binding that is observed with no synthesiserror. Quality control probes that do not contain a monomer depositedduring synthesis cycle 29 (i.e., those quality control probes that beginsynthesis at cycles 1-4 or 30-35) are unaffected, however, and thus theyshould hybridize to their reverse complement to the same degree as whenno synthesis error was present.

A quality control probe having the sequence of SEQ ID NO:1 wassynthesized on an ink jet oligonucleotide microarray using a staggeredstart. The quality control sequence was started at every progressivesynthesis cycle from 1 to 35 during the synthesis of the microarray. Thefluorescently labeled reverse complement of SEQ ID NO:1 was used tohybridize to the oligonucleotide microarray.

When there was inefficient synthesis in the first and second synthesiscycles during oligonucleotide microarray synthesis, only the first twostaggered start quality control probes were affected (FIG. 10). The meanfluorescence intensity of the quality control probes at each synthesiscycle was plotted and showed a decrease in intensity only at probes thatcontained part of their quality control probe sequence at the firstand/or second synthesis cycles of the microarray (FIG. 10B). All of thequality control probes that had synthesis that started subsequent to thesecond synthesis cycle were unaffected and hybridized to their reversecomplement equally well. Similar results were seen when there wasinefficient synthesis in the first five synthesis cycles (FIG. 11), thefirst eight synthesis cycles (FIG. 12), or the last fifteen synthesiscycles (FIG. 13) during oligonucleotide microarray synthesis. In eachcase, fluorescent intensity decreased only for quality control probesthat had monomers that contributed part of the sequence deposited at theaffected synthesis cycles of the microarray.

8. EXAMPLE 3 Increased Sensitivity of Quality Control Probes

8.1 Using Deletions

A synthesis failure during oligonucleotide microarray synthesis suchthat one or more synthesis cycles are compromised decreases the degreeof binding of the quality control probe to its fluorescently labeledreverse complementary oligonucleotide (e.g. see, Sections 6.2 and 7.1above). However, in instances where only a small number of synthesiscycles are compromised (i.e., one or two) such that the quality controlprobe is now slightly less than full length (i.e., a 24mer or 23merrelative to a full length 25mer), binding to its reverse complementaryoligonucleotide can still be relatively robust. In order to increase thesensitivity of synthesis failure detection, quality control probes withpredetermined binding sequences already containing a single deletionwere used in the methods of the invention. Such quality control probeshad a predetermined binding sequence synthesized with a deletion in thenineteenth residue (from the 5′ end) of SEQ ID NO: 1. Any additionaldeletions due to a failure during microarray synthesis would exacerbatethe defect and result in an increased deficiency in the ability to bindto the reverse complement of the full length 25mer sequence. FIG. 14shows that a single-deletion quality control probe on a microarray withsynthesis defects in the thirty fourth and thirty fifth synthesis cyclesis more sensitive than a quality control probe with no deletions. 5′ATCATCGTAGCTGGTCAGGTATCC 3′ (SEQ ID NO: 2)

Labeled reverse complement of the full-length 25 nucleotidepredetermined binding sequence was used to hybridize with qualitycontrol probes on each microarray. The mean fluorescence intensity plotof the quality control probes at each synthesis cycle was determined foreach microarray. The full length quality control probe shows a synthesisdefect starting at the fifteenth synthesis cycle (FIG. 14A). Thesingle-deletion quality control probe shows a synthesis error startingat the eleventh synthesis cycle (FIG. 14B). Thus the single-deletionquality control probe is a more sensitive measure of microarray quality.

8.2 In Comparison With Correlation Plots

These experiments show that using microarrays that contain one or moredefects can provide data that, on the surface, looks acceptable.However, when the data is compared to data from microarrays with nodefects, the problems become apparent. Correlation plots assess thequality of the data by examining the reproducibility of an experiment(e.g., using fluor-reversed pair analysis). Correlations between fluorreversed pairs were plotted for microarrays that had defects in thefirst twenty two synthesis cycles (FIG. 15A) and microarrays that had nosynthesis defects (FIG. 15B). Oligonucleotides were labeled with eitherred or green fluorescent dye and a mixture was used to hybridize to eachmicroarray. The log10 of the ratio of red to green fluorescent signalwas plotted against the log10 of the ratio of red to green fluorescentsignal for a duplicate chip. When data from a microarray with the first22 cycles of synthesis skipped was compared to itself no problem wasdetected (FIG. 15A). Similarly, when data from a non-defectivemicroarray was compared to itself no problem was detected (FIG. 15B).However, when data from a microarray with the first 22 cycles ofsynthesis skipped was compared to the data from a non-defectivemicroarray, there is a difference (FIG. 15C). Even a defectivemicroarray will result in data. Because it is not known beforehand whatthe data should look like, the data from defective arrays may initiallylook acceptable. The use of quality control probes according to theinvention safeguards against using poor quality data

Similar experiments were conducted with oligonucleotide microarrays thathad the first (FIG. 16A), first and second (FIG. 16B), thirty sixth(FIG. 16C), or thirty fourth and thirty fifth (FIG. 16D) synthesiscycles skipped during synthesis. Data from oligonucleotide hybridizationto the defective microarrays were plotted against data fromnon-defective microarrays. Again the plots all look similar and nosynthesis defect would have been detected. This demonstrates thatanalysis of microarrays with correlation plots is not sensitive enoughto identify defective microarrays.

9. REFERENCES CITED

All references cited herein are incorporated herein by reference intheir entirety and for all purposes to the same extent as if eachindividual publication or patent or patent application was specificallyand individually indicated to be incorporated by reference in itsentirety for all purposes.

Many modifications and variations of the present invention can be madewithout departing from its spirit and scope, as will be apparent tothose skilled in the art. The specific embodiments described herein areoffered by way of example only, and the invention is to be limited onlyby the terms of the appended claims along with the full scope ofequivalents to which such claims are entitled.

1. A positionally addressable array comprising a substrate to which are attached a plurality of different biopolymer probes, said different biopolymer probes in said plurality being situated at different positions on said surface and being the product of a step-by-step synthesis of said biopolymer probes on said substrate, said plurality of different binding probes comprising a plurality of quality control probes, each quality control probe in said plurality comprising (i) the same predetermined binding sequence or (ii) a different predetermined binding sequence with the same binding specificity, the synthesis of said predetermined binding sequence in each said quality control probe having been initiated during said step-by-step synthesis at sequential cycles of synthesis.
 2. The array of claim 1 wherein the sequence of each said quality control probe of said plurality consists of said predetermined binding sequence.
 3. The array of claim 1 wherein said plurality of quality control probes comprise a second sequence consisting of a chemical structure contiguous with said predetermined binding sequence, wherein at least some of said quality control probes differ from other of said quality control probes in the length of said chemical structure.
 4. The array of claim 3 wherein said chemical structure is a sequence of number 0 to N monomers contiguous with said predetermined binding sequence, and where N is a whole number equal to or greater than
 1. 5. A method of determining if a positionally-addressable biopolymer array has a synthesis defect comprising the following steps in the order stated: a) contacting the array of any of claims 1-2 with a sample comprising a binding partner that binds said predetermined binding sequence; b) detecting or measuring binding between two or more of said quality control probes and said binding partner in the sample; and c) comparing binding of said two or more of said quality control probes, wherein if said binding is similar, the absence of a synthesis defect between said sequential cycles of synthesis of said array is indicated.
 6. A method of determining if a positionally-addressable biopolymer array has a synthesis defect comprising the following steps in the order stated: a) contacting the array of claim 3 with a sample comprising a binding partner that binds said predetermined binding sequence; b) detecting or measuring binding between (i) two or more of said quality control probes that differ in the number of said monomers; and (ii) said binding partner in the sample; and c) comparing binding of said two or more of said quality control probes; wherein if said binding is similar, the absence of a synthesis defect between said sequential cycles of synthesis used to synthesize said two or more quality probes is indicated.
 7. The method of claim 5 wherein said comparing comprises determining the binding ratio of two of said two or more quality control probes, wherein said binding ratio is the amount of binding of a first of said two quality control probes with said binding partner, divided by the amount of binding of a second of said two quality control probes with said binding partner, and wherein said binding ratio between 0.5 and 2.0 indicates the absence of said synthesis defect.
 8. The method of claim 6 wherein said comparing comprises determining the binding ratio of two of said two or more quality control probes, wherein said binding ratio is the amount of binding of a first of said two quality control probes with said binding partner, divided by the amount of binding of a second of said two quality control probes with said binding partner, and wherein said binding ratio between 0.5 and 2.0 indicates the absence of said synthesis defect.
 9. The method of claim 6 further comprising before step (a) the step of synthesizing said array.
 10. The method of claim 6 wherein said sample comprises (i) total cellular RNA or mRNA from one or more cells or a plurality of nucleic acids derived therefrom, and (ii) said binding partner, wherein said binding partner is not expressed by said cells.
 11. The array of claim 2, 3, or 4 wherein said biopolymer probes are nucleic acids.
 12. The array of claim 11 wherein said predetermined binding sequence is in the range of 10-40 nucleotides in length.
 13. The array of claim 11 wherein said biopolymer probes consist of a sequence in the range of 20-100 nucleotides.
 14. The array of claim 12 wherein said predetermined binding sequence is 25 nucleotides in length.
 15. The array of claim 14 wherein said predetermined binding sequence is SEQ ID NO:1 or a complement thereof.
 16. The array of claim 2, 3, or 4 wherein said biopolymer probes are proteins.
 17. The array of claim 16 wherein said proteins are antibodies.
 18. The array of claim 2 wherein said predetermined binding sequence of said quality control biopolymer probe is between 10-75% of the length of the length of the biopolymer probes on the array that are not said quality control probes.
 19. The array of claim 18 wherein said predetermined binding sequence consists of 25 monomers, and wherein said biopolymer probes on the array that are not said quality control probes consist of 60 monomers.
 20. The array of claim 4 wherein N is not greater than the number of monomers in said biopolymer probes on the array that are not said quality control biopolymer probes minus the number of monomers in said predetermined binding sequence.
 21. The array of claim 4 wherein N is greater than the number of monomers in said biopolymer probes on the array that are not said quality control biopolymer probes minus the number of monomers in said predetermined binding sequence.
 22. The array of claim 4 which comprises three of said quality control probes that differ in N.
 23. The array of claim 22 wherein N is 0, 20, and 35, respectively, for different quality control probes.
 24. A method of making a positionally-addressable array of a plurality of different biopolymer probes comprising synthesizing said plurality of different biopolymer probes on a substrate from monomers using a step-by-step synthesis such that each of said different biopolymer probes is attached to said substrate at a different position on said substrate, wherein said plurality of different biopolymer probes comprise a plurality of quality control probes, each quality control probe in said plurality comprising the same predetermined binding sequence, wherein the synthesis of said predetermined binding sequence in each of said quality control probes is initiated during said step-by-step synthesis at sequential cycles of synthesis.
 25. The method of claim 24 wherein the sequence of each said quality control probe of said plurality consists of said predetermined binding sequence.
 26. The method of claim 24 wherein said plurality of quality control probes comprise a second sequence of number 0 to N monomers contiguous with said predetermined binding sequence, wherein at least some of said quality control probes differ from other of said quality control probes in the number of said monomers, and where N is a whole number equal to or greater than
 1. 27. The array of claim 1 wherein said plurality of quality control probes comprise (i) quality control probes whose sequence consists of said predetermined sequence; and (ii) quality control probes that comprise a second sequence of number 0 to N monomers contiguous with said predetermined binding sequence, wherein at least some of said quality control probes differ from other of said quality control probes in the number of said monomers, and where N is a whole number equal to or greater than
 1. 28. The array of claim 27 wherein said biopolymer probes are oligonucleotides, said predetermined sequence consists of 25 nucleotides, and said biopolymer probes that are not said quality control probes consist of 60 nucleotides.
 29. An oligonucleotide comprising a nucleotide sequence of SEQ ID NO:1 or SEQ ID NO:2 or the complement thereof.
 30. A positionally addressable array comprising a substrate to which are attached a plurality of different biopolymer probes, said different biopolymer probes in said plurality being situated at different positions on said surface and being the product of a step-by-step addition of monomers to said biopolymer probes on said substrate, said plurality of different biopolymer probes comprising a plurality of quality control probes, each quality control probe in said plurality comprising at least one labeled monomer, the addition of said labeled monomer to said quality control probe having been initiated during said step-by-step synthesis at sequential cycles of synthesis.
 31. A method of determining if the positionally-addressable biopolymer array of claim 30 has a synthesis defect comprising comparing the signal from said at least one labeled monomer of two or more of said quality control probes, wherein if said signal is similar, the absence of a synthesis defect between said sequential cycles of synthesis of said array is indicated.
 32. The method of claim 31 wherein said comparing comprises determining the signal ratio of two of said two or more quality control probes, wherein said signal ratio is the amount of signal emitted from a first of said two quality control probes divided by the amount of signal emitted from a second of said two quality control probes, and wherein said signal ratio between 0.5 and 2.0 indicates the absence of said synthesis defect.
 33. The array of claim 30 wherein said biopolymer probes are nucleic acids.
 34. The array of claim 33 wherein said biopolymer probes consist of a sequence in the range of 20-100 nucleotides.
 35. The array of claim 30 wherein said biopolymer probes are proteins.
 36. The array of claim 35 wherein said proteins are antibodies.
 37. The method of any one of claims 6, and 31 wherein said synthesis defect is a nozzle failure.
 38. The method of claim 37 wherein said array comprises at least a portion of said quality control probes arranged in a periodicity of P and wherein said array is synthesized by step-by-step synthesis using an inkjet printhead with P nozzles, and where P is a whole number equal to or greater than
 1. 39. The method of claim 38 wherein P equals
 20. 40. A method of detecting a nozzle failure using the positionally addressable array of claim 1 or 2 comprising the following steps in the order stated: a) contacting the array of any of claims 1 or 2 with a sample comprising a binding partner that binds said predetermined binding sequence, wherein at least a portion of said plurality of quality control probes is arranged in a periodicity of P and wherein said array is synthesized by step-by-step synthesis using an inkjet printhead with P nozzles, wherein P is a whole number equal to or greater than 1; b) detecting or measuring binding between two or more of said quality control probes and said binding partner in the sample; and c) comparing binding of said two or more of said quality control probes in a periodicity of P, wherein if said binding is similar, the absence of a nozzle defect is indicated.
 41. A method of detecting a nozzle failure using the positionally addressable array of claim 30 comprising comparing the signal from said at least one labeled monomer of two or more of said quality control probes in a periodicity of P, wherein at least a portion of said plurality of quality control probes is arranged in a periodicity of P and wherein said array is synthesized by a step-by-step synthesis using an inkjet printhead with P nozzles, wherein if said signal is similar, the absence of a nozzle defect is indicated, and wherein P is a whole number equal to or greater than
 1. 42. The method of claim 5 further comprising before step (a) the step of synthesizing said array.
 43. The method of claim 5 wherein said sample comprises (i) total cellular RNA or mRNA from one or more cells or a plurality of nucleic acids derived therefrom, and (ii) said binding partner, wherein said binding partner is not expressed by said cells.
 44. The method of claim 5 wherein said synthesis defect is a nozzle failure.
 45. The method of claim 44 wherein said array comprises at least a portion of said quality control probes arranged in a periodicity of P and wherein said array is synthesized by step-by-step synthesis using an inkjet printhead with P nozzles, and where P is a whole number equal to or greater than
 1. 46. The method of claim 45 wherein P equals
 20. 